PDF

EYE TRACKING IN SECOND
LANGUAGE ACQUISITION AND

BILINGUALISM
Eye Tracking in Second Language Acquisition and Bilingualism provides foundational

knowledge and hands-on advice for designing, conducting, and analyzing
eye-tracking research in applied linguistics. Godfroid’s research synthesis and
methodological guide introduces the reader to fundamental facts about eye
movements, eye-tracking paradigms for language scientists, data analysis, and the
practicalities of building a lab.This indispensable book will appeal to undergraduate
students learning principles of experimental design, graduate students developing
their theoretical and statistical repertoires, experienced scholars looking to expand
their own research, and eye-tracking professionals.
Aline Godfroid is an Associate Professor in Second Language Studies and

TESOL at Michigan State University. Her primary research interests are in
psycholinguistics, vocabulary, quantitative research methods, and eye-tracking
methodology. Her research is situated at the intersection of cognitive psychology
and second language acquisition and has appeared in numerous international,
peer-reviewed journals. Aline Godfroid is Co-Director of the Second Language
Studies Eye-Tracking Lab and the recipient of the 2019 TESOL Award for
Distinguished Research.
Second Language Acquisition Research Series
Susan M. Gass and Alison Mackey, Series Editors
The Second Language Acquisition Research series presents and explores issues bearing
directly on theory construction and/or research methods in the study of second
language acquisition. Its titles (both authored and edited volumes) provide thor-
ough and timely overviews of high-interest topics, and include key discussions of
existing research findings and their implications. A special emphasis of the series is
reflected in the volumes dealing with specific data collection methods or instru-
ments. Each of these volumes addresses the kinds of research questions for which
the method/instrument is best suited, offers extended description of its use, and
outlines the problems associated with its use. The volumes in this series will be
invaluable to students and scholars alike, and perfect for use in courses on research
methodology and in individual research.
Using Judgments in Second Language Acquisition Research

Patti Spinner and Susan M. Gass
Language Aptitude
Advancing Theory, Testing, Research and Practice
Edited by Zhisheng (Edward) Wen, Peter Skehan, Adriana Biedroñ, Shaofeng Li and
Richard Sparks
For more information about this series, please visit: www.routledge.com/

Second-Language-Acquisition-Research-Series/book-series/LEASLARS
Of related interest:
Second Language Acquisition
An Introductory Course, Fourth Edition
Susan M. Gass with Jennifer Behney and Luke Plonsky
Second Language Research
Methodology and Design, Second Edition
Alison Mackey and Susan M. Gass
EYE TRACKING IN
SECOND LANGUAGE
ACQUISITION AND
BILINGUALISM
A Research Synthesis and
Methodological Guide
Aline Godfroid
First published 2020
by Routledge
52 Vanderbilt Avenue, New York, NY 10017
and by Routledge
2 Park Square, Milton Park, Abingdon, Oxon, OX14 4RN
Routledge is an imprint of the Taylor & Francis Group, an informa business
© 2020 Taylor & Francis
The right of Aline Godfroid to be identified as author of this work
has been asserted by her in accordance with sections 77 and 78 of
the Copyright, Designs and Patents Act 1988.
All rights reserved. No part of this book may be reprinted or
reproduced or utilised in any form or by any electronic, mechanical,
or other means, now known or hereafter invented, including
photocopying and recording, or in any information storage or
retrieval system, without permission in writing from the publishers.
Trademark notice: Product or corporate names may be trademarks
or registered trademarks, and are used only for identification and
explanation without intent to infringe.
Library of Congress Cataloging-in-Publication Data
A catalog record for this title has been requested
ISBN: 978-1-138-02466-3 (hbk)
ISBN: 978-1-138-02467-0 (pbk)
ISBN: 978-1-315-77561-6 (ebk)
Typeset in Bembo
by Deanta Global Publishing Services, Chennai, India
Visit the eResources: https://www.routledge.com/9781138024670
To Koen
CONTENTS
List of figures xii

List of tables xix
Prefacexxi
Acknowledgmentsxxiii
1 Introducing Eye Tracking 1

1.1 Online Methodologies in Language Processing Research 2
1.1.1 Think-Aloud Protocols 2
1.1.2 Self-Paced Reading 6
1.1.3 Eye Tracking 10
1.1.4 Event-Related Potentials 13
1.1.5 Synthesis 17
1.2 Why Study Eye Movements? 19
1.3 Summary 23
Notes 23
2 What Do I Need to Know about Eye Movements? 24

2.1 The Observer and the Visual Field 24
2.2 Types of Eye Movements 30
2.3 The Perceptual Span 38
2.4 Where the Eyes Move 43
2.5 When the Eyes Move 46
viii Contents
2.6 How Tight Is the Eye-Mind Link? A Look at Two Models of

Eye-Movement Control 51
2.7 Conclusion 57
Notes 59
3 What Topics Can Be Studied Using Text-Based Eye

Tracking? A Synthetic Review 61
3.1 Finding a Research Topic 61
3.2 Research Strands within Text-Based Eye Tracking 63
3.2.1 Grammar 65
3.2.2 Vocabulary and the Bilingual Lexicon 71
3.2.3 Instructed Second Language Acquisition 77
3.2.4 Subtitles 80
3.2.5 Assessment 82
3.3 Conclusion 86
Notes 86
4 What Topics Can Be Studied Using the Visual World

Paradigm? A Synthetic Review 88
4.1 Foundations of the Visual World Paradigm 88
4.2 Research Strands within Visual World Eye Tracking 97
4.2.1 Word Recognition 99
4.2.2 Prediction 103
4.2.2.1 What is Prediction? 103
4.2.2.2 Semantic Prediction 107
4.2.2.3 Morphosyntactic Prediction 108
4.2.2.4 Prediction Using Multiple Cues 112
4.2.2.5 Effects of Instruction 113
4.2.3
Referential Processing 116
4.2.4
Production 120
4.3 Conclusion 123
Notes 125
5 General Principles of Experimental Design 126

5.1 Doublets,Triplets, and Quadruplets 126
5.2 Between- and Within-Subjects Designs 134
5.3 Trials: Practice Trials, Critical Trials, and Filler Trials 138
5.4 Primary Tasks and Secondary Tasks 142
5.5 How Many Items Do I Need? 151
5.6 Conclusion 156
Notes 157
Contents ix
6 Designing an Eye-Tracking Study 158

6.1 Defining Areas of Interest 158
6.1.1 Word-Based Interest Areas 159
6.1.2 Larger Areas of Text 162
6.1.3 Image-Based Interest Areas 163
6.1.3.1 Images in Text-Based Research 163
6.1.3.2 Images in the Visual World Paradigm 165
6.1.4 Setting Interest Areas in Your Own Research 171
6.2 Guidelines for Text-Based Eye-Tracking Research 171
6.2.1 Spatial Constraints 171
6.2.2 Artistic Factors 174
6.2.3 Linguistic Constraints 177
6.2.3.1 Experimental and statistical control 178
6.3 Visual World Research 181
6.3.1 Selecting Images 182
6.3.1.1 Experimental Design 182
6.3.1.2 Visual Properties of Images 186
6.3.1.3 Naming Consistency and Normed Databases 188
6.3.1.4 Should I Have a Preview? 191
6.3.1.5 Should my Experiment Have a
Fixation Cross? 193
6.3.2
Preparing Audio Materials 196
6.3.2.1 Creating Audio Materials 196
6.3.2.2 Defining Time Periods 198
6.4 Conclusion 202
Notes 203
7 Eye-Tracking Measures 205

7.1 Eye-Tracking Measures in Text-Based and Visual
World Research 206
7.2 Eye-Movement Measures 210
7.2.1 Fixations and Skips 210
7.2.1.1 Counts, Probabilities, and Proportions 210
7.2.1.2 Fixation Duration 214
7.2.1.3 Fixation Latency 229
7.2.1.4 Fixation Location 231
7.2.2 Regressions 232
7.2.3 Integrated Eye-Tracking Measures 237
7.2.3.1 Heatmaps, Luminance Maps, and Gaze Plots 237
7.2.3.2 Scanpaths 242
7.3 Conclusion:What Measures Should I Use? 247
Notes 250
x Contents
8 Data Cleaning and Analysis 251

8.1 Data Cleaning 251
8.1.1 Data Cleaning Software 252
8.1.2 Inspecting Individual Participant Records and Trials 253
8.1.3 Correcting for Drift 257
8.2 Dealing with Outliers 260
8.2.1 Dealing with Overly Short and Long Fixations 261
8.2.2 Data Transformation 263
8.2.3 Accounting for Outliers: Model Criticism or Aggressive
A Priori Screening? 265
8.3 Overview of Statistical Practices in Current
Eye-Tracking Research 271
8.4 Linear Mixed-Effects Models 275
8.4.1 What’s Wrong with Repeated-Measures ANOVA? 275
8.4.2 Introducing Linear Mixed-Effects Models 276
8.4.3 Data-driven Versus Top-Down Approaches to Selecting
a Random Effects Structure 278
8.4.4 Worked Example 281
8.4.5 Reporting the Results 286
8.5 Analyzing Time-Course Data 287
8.5.1 Analyzing Separate Time Windows 288
8.5.2 Growth Curve Analysis 288
8.5.2.1 Data Preprocessing 288
8.5.2.2 Data Visualization 291
8.5.2.3 Logistic or Quasi-Logistic Regression 296
8.5.2.4 Choosing Time Terms 299
8.5.2.5 Worked Example 302
8.5.2.6 Reporting the Results 306
8.6 Conclusion:Which Analysis Should I Use? 307
Notes 309
9 Setting up an Eye-Tracking Lab 311

9.1 Choosing an Eye Tracker 311
9.1.1 Types of Eye Trackers and Their Precursors 311
9.1.2 Video-Based Eye Trackers 316
9.1.3 How Does an Eye Tracker Work? Speed, Accuracy, and
Precision 322
9.2 The Eye-Tracking Lab 330
9.2.1 Practical Considerations 330
9.2.2 Spatial and Technical Requirements for a Lab 334
9.2.3 Managing an Eye-Tracking Lab 336
Contents xi
9.3 Getting Started 338
9.3.1 Ideas for Research 338
9.3.1.1 Research Idea 1: Entry-Level: Create a
Sentence-Processing Experiment 338
Text Reading Study 339
9.3.1.3 Research Idea 3: Entry-Level: Study Script
Effects in Reading 341
9.3.1.4 Research Idea 4: Entry-Level: Create a Visual
World Study 343
9.3.1.5 Research Idea 5: Intermediate: Replicate a L1
Reading Study with L2 Readers 345
9.3.1.6 Research Idea 6: Intermediate: Replicate a L1 Visual World
Study with L2 Listeners or Bilinguals 347
9.3.1.7 Research Idea 7: Intermediate: Conduct an
Interaction Study 348
9.3.1.8 Research Idea 8: Advanced: Examine L2 Listening as a
Multimodal Process 349
9.3.1.9 Research Idea 9: Advanced: Study Cognitive Processes dur-
ing Intentional Vocabulary Learnings 351
9.3.1.10 Research Idea 10: Advanced: Conduct a Synchronous
Computer Mediated Communication (SCMC) Study with
Eye Tracking 353
9.3.2 Tips for Beginners 355
9.3.2.1 About the Equipment 355
9.3.2.2 About Data Collection 356
9.3.2.3 About Data Analysis 361
Notes 363
References364
Index of Names 401
Index406
FIGURES
1.1 Converging evidence from traditional null hypothesis

significance testing and equivalence tests that neither
think alouds nor eye tracking affect reading comprehension 5
1.2 Three presentation formats in self-paced reading 7
1.3 Sample ERP waveforms, illustrating different ERP effects 13
1.4 Sample trial in a location-cuing experiment 20
1.5 Examples of decoupling between eye gaze and cognitive
processing22
2.1 Visual acuity in the fovea, parafovea, and periphery 25
2.2 The two major axes through which light travels in the eye 25
2.3 Cone and rod density in the fovea, parafovea, and periphery 26
2.4 A three-dimensional rendering of the ellipsoid-shaped
visual field with visual angles extending from the point of
gaze outwards 27
2.5 Relationship of viewing distance d, stimulus size x, and
the visual angle θ 28
2.6 A sequence of fixations (circles) and saccades (lines) on a
TOEFL® Primary™ practice reading test item 30
2.7 Idealized saccadic profile: eye gaze displacement, velocity,
and acceleration 33
2.8 Short- and long-distance saccades in a reading experiment 35
2.9 Fixational eye movements during a three-second fixation 37
2.10 The visual field 39
2.11 The gaze-contingent moving window paradigm 41
Figures xiii
2.12 Urdu and English sentences displayed in a gaze-

contingent moving window paradigm with different
window sizes 42
2.13 Undershooting (A) and overshooting (B) of arts in the
phrase sciences and arts given different launch sites in the
word sciences 45
2.14 Models of eye-movement control 52
2.15 A grid of E-Z Reader and SWIFT models of eye-
movement control 53
2.16 Example of a numerical simulation of the SWIFT model 54
2.17 Schematic representation of the E-Z Reader model 55
3.1 Distribution of eye-tracking studies across 16 SLA journals 64
3.2 A critical trial in a grammar learning experiment experiment 70
3.3 An example trial from the exposure phase in a study on
learned attention 78
3.4 Sample trial in an ISLA study 79
3.5 A gap-fill task in assessment research 83
3.6 Sample picture description task 85
4.1 Display used in Cooper (1974) while participants listened
to a story 90
4.2 Display used in Allopenna et al. (1998) 92
4.3 Linking eye movement data and activation levels of
lexical representations as predicted by the TRACE model 93
4.4 Display used in Altmann and Kamide (1999) 94
4.5 Display used in Altmann and Kamide (2007) 95
4.6 Distribution of eye-tracking studies across 16 SLA and
bilingualism journals 97
4.7 Display used in a word recognition experiment 101
4.8 Display used in the oculomotor Stroop task 102
4.9 Display used in gender prediction experiments 109
4.10 Display used in a morphosyntactic prediction experiment 111
4.11 Display used in a vocabulary learning experiment 115
4.12 Display used in a referential processing study 118
4.13 Three experimental conditions in a sentence-matching task 119
4.14 Motion event used in an oral production study 121
4.15 Eye-tracker set up in an oral production study 122
5.1 Common item types: doublets (two levels), triplets
(three levels), and quadruplets (four levels) 129
5.2 Schematic representation of different item types 130
5.3 A quadruplet 131
5.4 A quadruplet 133
xiv Figures
5.5 A quadruplet drawn from doublets of visual and

linguistic-auditory stimuli 133
5.6 Between-subjects design and within-subjects design for
studying the role of captions in vocabulary acquisition 136
5.7 Counterbalancing items across four lists 137
5.8 Trial sequence 139
5.9 Between-subjects design and within-subjects design for
studying the role of captions in vocabulary acquisition 155
6.1 Interest areas in a vocabulary learning study 160
6.2 Interest areas in a grammatical-ungrammatical sentence pair 161
6.3 Full captioning (left) and keyword captioning (right) 162
6.4 English lexical decision task with eye tracking 162
6.5 Different types of interest area in a banked gap-fill test 163
6.6 Pictorial interest areas 164
6.7 Interest areas in assessment research 165
6.8 Sample display with a target image (right) and a
nontarget, distractor image (left) 166
6.9 Three types of interest areas around discrete images in a
visual world experiment 167
6.10 Four image roles in the same visual display 168
6.11 Three types of interest areas around objects in a visual scene 169
6.12 Dynamic interest areas in a movie depicting a motion event 170
6.13 A screen display before (top-right) and after
(bottom-right) the researchers’ intervention 173
6.14 An example of monospaced and proportional font 176
6.15 Examples of 12-point, monospaced font types 177
6.16 Modified verb list from Godfroid and Uggen (2013), with
word length and frequency information recorded for all
the verbs in the study 179
6.17 Display from Dijkgraaf et al.’s (2017) study 183
6.18 Original display (left) and hypothetical extension (right) 184
6.19 Changing roles of targets and distractors in
Dijkgraaf et al. (2017) 185
6.20 Three possible displays of a balloon, a shark, a shovel,
and a hat 187
6.21 Three-stage trial: (1) fixation cross, (2) image preview, (3)
audio + image processing with eye-movement recording 194
6.22 Three-stage trial: (1) word preview, (2) fixation cross, (3)
audio + word processing with eye-movement recording 195
6.23 Visualization of the spoken sentence Every alligator lies in a
bathtub in Audacity 199
Figures xv
6.24 Splicing the target noun onto the carrier phrase201

6.25 Eye fixation patterns plotted against time201
7.1 Taxonomy of eye-tracking measures used in
text-based studies 207
7.2 Taxonomy of eye-tracking measures used in visual
world studies 208
7.3 Types of eye-tracking measures used in text-based and
visual world studies 209
7.4 Two different reading patterns for an unfamiliar word,
tahamul, embedded in context 210
7.5 Count measures in eye tracking in SLA and bilingualism 211
7.6 The ‘big four’ durational measures (upper panel) and
other durational measures (lower panel) in eye tracking in
SLA and bilingualism 214
7.8 First fixation latency in single-word reading 230
7.9 First fixation latency in an oculomotor Stroop task 231
7.10 First fixation location in single-word reading 232
7.11 Regression measures in eye tracking in SLA
and bilingualism 234
7.13 Integrated measures in eye tracking in SLA
and bilingualism 237
7.14 Heatmaps of fixation behavior during an English speaking
test: L1 English children (top) and English language
learners (bottom) 239
7.15 Two visual representations of essay-rating data: (a)
heatmap and (b) luminance map 240
7.16 Gaze plots during an English speaking test: L1 English
child participant (top) and English language learner (bottom) 243
7.17 Sequence of teacher scanpaths 244
7.18 Functional regions for four grammatical structures in a
grammaticality judgment test 245
7.19 Sentence reading pattern with different functional regions
superimposed246
7.20 Overlap (non-independence) between three common
durational measures 248
7.21 Alternative decomposition of a viewing episode into
statistically independent, durational measures 249
xvi Figures
8.1 Text skimming 254

8.2 Temporal graph of the raw data in Figure 8.1 254
8.3 Blinks in an eye-movement record 255
8.4 Temporal graph of the raw data in Figure 8.3 256
8.5 (a) Vertical drift in eye-movement recording and
corresponding data set after data cleaning: (b) manual
adjustment, (c) semi-automatic cleaning with Drift
Correct, and (d) Drift Correct followed by manual
adjustment258
8.6 Reading data for an L2 English speaker reading a text with
marginal glosses 259
8.7 A four-stage procedure for dealing with outliers 260
8.8 Frequency distribution for fixation durations during
sentence reading 262
8.9 Typical distribution of total reading times before
transformation (left panel) and after log transformation
(right panel) 265
8.10 Residuals (error terms) for a reanalysis of Godfroid and
Uggen’s (2013) data 268
8.11 Q–Q (quantile–quantile) plots showing model fit
following different outlier treatment strategies 271
8.12 Statistical procedures in visual world and
production research 272
8.13 Changing trends in statistical practices in L2 and
bilingualism eye-tracking research: (generalized) linear
mixed-effects models are catching up with ANOVA 273
8.14 Statistical procedures in text-based research 273
8.15 Analyses of eye-fixation duration measures in text-based
eye-tracking research 274
8.16 Three ways of laying out the same data set: for a mixed-
effects regression analysis, a F1 by-subject ANOVA, and a
F2 by-item ANOVA 276
8.17 Fitting a LMM for a fixed set of independent variables 282
8.18 Treatment of Time in different statistical procedures 287
8.19 Calculating fixation proportions based on raw
eye-tracking data 289
8.20 Sample display from a verb argument prediction study 294
8.21 Time course graph for the L1 English group 295
8.22 Proportion, odds, and log odds 296
8.23 Visual inspection of behavioral data 300
8.24 Natural and orthogonal polynomials 301
Figures xvii
8.25 Fitted (predicted) values for looks to target in the early

time window 305
8.26 A menu of common statistical options in eye-tracking
research308
9.1 The Dodge photochronograph 312
9.2 (a) (left) Drawing based on one healthy participant’s
photographic record. (b) (right) Eye-movement trace for
the sentence 313
9.3 A scleral contact lens 313
9.4 Electrooculography 314
9.5 Schematic representation of the eye 314
9.6 Relative positions of the pupil and the corneal reflection
for different points of regard 315
9.7 A remote eye tracker with the camera inside the display
monitor 316
9.8 A remote eye tracker with the camera on the table in
front of the participant 317
9.9 A remote eye tracker with the camera on the table in
front of the participant 317
9.10 A head-mounted eye tracker 318
9.11 Eye-tracking glasses 318
9.12 A head-mounted mobile eye tracker 319
9.13 (a) A remote eye tracker mounted in a tower above the
participant’s head. (b) A hi-speed tower-mount eye tracker 319
9.14 A remote head-tracker with a target sticker 320
9.15 A magnetic head-tracker 320
9.16 Flow-chart for deciding on an eye-tracking solution 321
9.17 Eye-movement data before and after event detection 323
9.18 Temporal sampling frequencies of three eye trackers 325
9.19 Precision and accuracy of an eye-fixation measurement 328
9.20 Collecting data outside the lab: Sri Lanka 332
9.21 In-home eye-tracking project in Chicago, IL 333
9.22 Camera set-up to study eye gaze during oral interaction 333
9.23 (a) (left) A two-PC configuration with Host PC and
Display PC. (b) (right) Two-room eye-tracking lab 335
9.24 Display PC and host PC separated by a panel 335
9.25 An experimental item (top-left) followed by a
comprehension question (bottom-right) 339
9.26 One paragraph and comprehension question 340
9.27 Example sentence pairs 344
xviii Figures
9.28 Three presentation formats of Chinese words, their

pronunciation and meaning 352
9.29 An example of a transition matrix for wèizhi, “location”,
with the Chinese characters, pinyin, and English
translation arranged vertically 352
9.30 An example of the Porta test of eye dominance, where
the thumb is aligned with a stop sign 359
TABLES
1.1 Predicted reading times per word in four reading methods 8

1.2 Comparison of thinking aloud, SPR, eye tracking, and ERPs 18
2.1 Degrees of visual angle of Courier font point 16–24 at
common viewing distances 29
2.2 The range of mean fixation durations and saccade length in
different tasks 32
2.3 Variables influencing fixation duration 49
3.1 Questions in eye-tracking research on grammar acquisition
and processing 71
3.2 Questions in eye-tracking research on vocabulary and the
bilingual lexicon 76
3.3 Questions in eye-tracking research on ISLA 80
3.4 Questions in eye-tracking research on multimodal input 81
3.5 Questions in eye-tracking research on language assessment 85
4.1 Questions in visual world eye tracking on lexical
processing and word recognition 103
4.2 Questions in visual world eye tracking on prediction 106
4.3 Questions in visual world eye tracking on referential
processing120
4.4 Questions in visual world eye tracking on production 123
5.1 Comparison of four secondary tasks 150
5.2 Number of observations in L2 eye-tracking research
with text 154
5.3 Number of observations in L2 visual-world eye-tracking
research154
xx Tables
6.1 Best-fitting linear mixed effects model for log first pass

reading time 180
6.2 Normed databases for picture selection 190
7.1 Definitions and examples of count measures 211
7.2 Definitions and examples of duration measures 215
7.3 Definition and example of first fixation latency 229
7.4 Definition and example of first fixation location 232
7.5 Definitions and examples of regression measures 236
7.6 Four types of heatmap 241
8.1 Comparison of different outlier treatment strategies 270
8.2 Backward model selection: The maximal model and a
more parsimonious competitor model 284
8.3 The best-fitting linear mixed-effects model 285
8.4 Final model after model criticism 286
8.5 Raw eye-tracking data for three trials (trial excerpts) from
one participant 289
8.6 Binned eye-tracking data for three trials (trial excerpts)
from one participant 291
8.7 Binned eye-tracking data with dependent variables used
for plotting and analysis: fixation proportion (FixProp) and
empirical logit (elog) 292
8.8 Comparison of logistic and quasi-logistic regression 299
8.9 Forward model selection in logistic regression: A base
model and three competitor models 304
8.10 Forward model selection in quasi-logistic regression: A base
model and three competitor models 304
8.11 The final model (Model 2) in the logistic regression analysis 305
PREFACE
Eye-movement registration, commonly referred to as eye tracking, has proven

to be a valuable technique across multiple disciplines to study human cognition.
Although eye-movement registration dates back at least to the 19th century, tech-
nological advances and the advent of commercially available eye trackers have
revolutionized the use of eye-tracking methodology in research. You no longer
need to be an engineer (and build your own eye tracker) to engage in eye-tracking
research. While more researchers than ever before are seeing the value of eye-
movement registration for their work, the decoupling of the technical and meth-
odological aspects of eye-tracking research has not been without challenges. Eye
fixations may offer a window into the mind, but to do so studies require careful
planning and experimental design.
My own experience with eye-tracking methodology began at the University
of Brussels, Belgium, around 2006, when I was fortunate to gain access to a head-
mounted eye tracker in the psychology department.Trained as an applied linguist,
I quickly became engrossed in the foundational work on eye movements in read-
ing that had accumulated over the last 30 years in cognitive psychology. Clearly,
our two disciplines shared common interests, but as an applied linguist, I kept
asking myself how bilingualism and second language researchers could similarly
benefit from eye tracking to address questions about language processing, acquisi-
tion, instruction, and use. This book represents the outcome of a nearly 15-year
journey devoted to answering that question. Although the ground rules of eye
tracking do not change between disciplines, its applications do, and this com-
munication between disciplines carries a lot of potential for creating new knowl-
edge. With this book, I aim to help build the disciplinary identity of eye-tracking
research in second language acquisition and bilingualism. This has been an act of
adaptation and creation, as much as it has been an act of translation, because with
xxii Preface
a new field come new questions and possibilities for innovation. At the same time,
many design issues in eye-tracking research do generalize across language-related
disciplines. Therefore, I hope other researchers working across the language sci-
ences will find this book useful as well.
The structure of this book reflects the different stages of the research cycle. I
organized it this way with my graduate students in mind, who, over the course
of a 15-week seminar, learn the ropes of eye tracking in the Second Language
Studies Program at Michigan State University. Depending on where you find
yourself in the research process, you may find it useful to read the corresponding
chapters in the book. Chapter 1 introduces eye-tracking methodology in rela-
tion to other real-time data collection methods. It will be most useful if you are
considering whether eye tracking could enrich your research program and what
other data sources eye tracking can be triangulated with. Chapter 2 summarizes
my reading of the cognitive psychology literature. It boils down nearly 45 years
of fundamental research to a chapter-length, accessible summary of fundamental
facts about eye movements that probably every eye-tracking researcher in the lan-
guage sciences should know. Chapters 3 and 4 present the findings of a synthetic
review of eye-tracking research in second language acquisition and bilingualism,
highlighting major themes and developments in the field so researchers can situ-
ate their own work and find a topic. Chapters 5 and 6 cover the design of eye-
tracking studies. Chapter 5 is a stepping stone for Chapter 6, in that it introduces
basic principles of experimental design. Equipped with this knowledge, readers
can tackle the eye-tracking-specific information in Chapter 6. After reading these
chapters, you will be able to conceptualize and design your own eye-tracking
project. Chapter 7 provides a comprehensive overview of eye-tracking measures
in second language acquisition and bilingualism. You can focus specifically on
those measures you use in your own research or read with the aim of exploring,
so you can diversify and expand your current selection of measures. Chapter 8
covers topics in data cleaning and analysis. It caters to readers with different levels
of statistical literacy by providing both an overview of current statistical practices
and an in-depth introduction to newer statistical techniques (linear mixed-effects
models and growth curve analysis) that have gained importance in recent years.
Lastly, Chapter 9 brings the reader full circle by providing practical advice on
purchasing or renting an eye tracker, setting up a lab, tips for data collection, and
ideas for research. Overall, this book will explain the details of how and why to best
collect and analyze eye-movement recordings for well-designed and informative
language research.
ACKNOWLEDGMENTS
Writing is a process, and this process would not have been as rewarding and at
times even fun without the help and encouragement of a group of talented and
caring people. I am thankful to the series editors Susan Gass and Alison Mackey
for giving me an opportunity to write this book and to the Routledge editors
Ze’ev Sudry and Helena Parkinson for overseeing the publication process. My
thanks also go to Paula Winke, Co-Director of the Second Language Studies Eye-
Tracking Lab, and to all the previous students in my eye-tracking course and in
particular JinSoo Choi, Caitlin Cornell, and Megan Smith, who have commented
on different chapters in this book. Markus Johnson from SR Research and Wilkey
Wong from Tobii have been most helpful answering my questions about eye
trackers and provided feedback on the final chapter of the book. Chapters 4, 6,
and 8 also benefited from conversations with Gerry Altmann and Denis Drieghe.
I owe a special debt of gratitude to Carolina Bernales, Bronson Hui, and
Kathy MinHye Kim for their numerous contributions to the book, which have
made it better in so many ways. My thanks also go to Dustin Crowther and
Kathy MinHye Kim for their help coding the studies. Elizabeth Huntley, Wenyue
Melody Ma, and Koen Van Gorp proofread all the chapters and made many astute
suggestions that have made this book a better read. I thank all the eye-tracking
researchers who generously contributed examples from their studies and whose
work has had an obvious impact on my thinking. I thank my writing partners in
crime, Patricia Akhimie, Claudia Geist, Gustavo Licon, and Kelly Norris Martin,
for motivating me to show up and do the work week after week. I thank my East
Lansing friends Natalie Philips and John McGuire and my Ann Arbor friends for
making cold Michigan winters a little warmer and lastly, I thank my family in
Belgium and Koen for their unwavering love and support as I pursued my aca-
demic dreams on a new continent.
Aline Godfroid, Haslett, Michigan
1
INTRODUCING EYE TRACKING
In a fast-changing and multilingual world, the study of how children and adults
learn languages other than their native tongue is an important endeavor. Questions
about second language (L2) learning are at the heart of the sister disciplines of
second language acquisition (SLA) and bilingualism, and researchers who work in
these areas have an increasingly diverse and sophisticated methodological toolkit
at their disposal (Sanz, Morales-Front, Zalbidea, & Zárate-Sández, 2016; Spivey
& Cardon, 2015). In addition, it seems that in the 21st century, the preferred way
to investigate questions of language processing and representation is online—that
is, as processes unfold in real time—because the data obtained in this way offer a
more fine-grained representation of the learning process than any offline meas-
urements could (Frenck-Mestre, 2005; Godfroid & Schmidtke, 2013; Hama &
Leow, 2010). This book is about one online methodology that is well suited for
studying both visual and auditory language processing, namely eye-movement
registration, commonly referred to as eye tracking.
Eye tracking is the real-time registration of an individual’s eye movements, typ-
ically as he or she views information on a computer screen. Within the Routledge
Series on Second Language Research Methods, this guide on eye-tracking method-
ology is the third to be devoted to an online data collection method, follow-
ing Bowles’s (2010) meta-analysis of reactivity research regarding think-alouds,
and Jiang’s (2012) overview of reaction time methodologies. This shows how
eye tracking is part of a collection of online techniques that have been gain-
ing momentum in SLA and bilingualism (also see Conklin, Pellicer-Sánchez, &
Carrol, 2018). Across the language sciences, linguists, applied linguists, language
acquisitionists, bilingualism researchers, psychologists, education researchers, and
communication scientists have similarly embraced the recording of eye move-
ments in their research programs. Although the research reviewed in this book is
2 Introducing Eye Tracking
primarily from SLA and bilingualism, the principles for researching language with
eye tracking generalize to other domains that use similar materials as well, which
gives the methodological part of this book a broad, interdisciplinary reach.
To understand where eye tracking fits within the larger movement toward
online research methodologies, and to appreciate some of its strengths and
weaknesses, I introduce eye tracking along with three other concurrent meth-
odologies—think-aloud protocols, self-paced reading (SPR) and event-related
potentials (ERPs)—which present themselves as complements and sometimes
competitors to the eye-tracking method.
1.1 Online Methodologies in Language

Processing Research
Online (real-time, concurrent) methodologies are a class of data collection meth-
ods that provide information about a participant’s receptive or productive language
processing as it happens. Online methods stand in contrast with offline methods,
which are temporally disconnected from the task processes under investigation.
Almost everything budding applied linguists learned about SLA until the 1990s
was based on offline methods, such as grammaticality judgments, picture descrip-
tion tasks, sentence-picture matching tasks, comprehension tests, and many more
(see Mackey & Gass, 2016, for a review). Any method providing accuracy data as its
main output, often as a part of a pretest-posttest design, can be considered offline.
Although offline measures remain important in understanding SLA, these
measures are now frequently supplemented by concurrent or online data col-
lection methods. Thus, we may find researchers record ERPs (Morgan-Short,
Sanz, Steinhauer, & Ullman, 2010; Morgan-Short, Steinhauer, Sanz, & Ullman,
Michael, 2012) or reading time data (Godfroid, Loewen, Jung, Park, Gass, & Ellis,
2015; Leeser, Brandl, & Weissglass, 2011) during grammaticality judgments, collect
think-alouds in addition to comprehension pre- and posttest data (Leow, 1997,
2000) or gather reaction times during sentence-picture matching (e.g., Godfroid,
2016; Leung & Williams, 2011). In general, the addition of an online data col-
lection method provides the researcher with a richer and time-sensitive account
of ongoing processing (e.g., Clahsen, 2008; Frenck-Mestre, 2005; Godfroid &
Schmidtke, 2013; Hama & Leow, 2010). Some questions (for instance, about the
neural basis of language) can be investigated only through online methodolo-
gies. Mitchell (2004) drove this point home when he noted in regard to sentence
processing that “any method based on probing events after a delay … may have
‘missed the show’” (p. 16).
1.1.1 Think-Aloud Protocols
Thinking aloud is when a participant says out loud his or her thoughts while
carrying out a particular task, such as solving a math problem, reading, or taking
Introducing Eye Tracking 3
a test. That particular task, known as the primary task, is the one researchers want
to study, and thinking aloud is sometimes referred to as the secondary task (e.g.,
Ericsson & Simon, 1993; Fox, Ericsson, & Best, 2011; Godfroid & Spino, 2015;
Goo, 2010; Leow, Grey, Marijuan, & Moorman, 2014), which is used to shed
light on the main task of interest. Thus, think alouds are a tool researchers use
to study cognitive processes as they unfold during some type of human activity,
such as language processing.
Think-aloud protocols stand out among the family of concurrent or online
data collection methods because they yield qualitative, rather than quantitative,
data as their primary outcome. This makes them an interesting supplement for
other online methods, which produce quantitative data, even though it is also
possible to analyze think-alouds quantitatively after data coding (e.g., Bowles,
2010; Ericsson & Simon, 1993; Leow et al., 2014). An ingenious study that trian-
gulated think alouds and eye tracking was Kaakinen and Hyönä (2005). Kaakinen
and Hyönä manipulated L1 Finnish participants’ purpose for reading, by asking
them to learn more about one of two rare diseases that a friend had supposedly
contracted. Using the eye-tracking data, the authors showed that sentences that
were relevant to their participants’ reading perspective (i.e., their friend’s disease)
generated longer first-pass reading times than sentences that dealt with the other
disease also discussed in the text. In addition, participants more often showed evi-
dence of deeper levels of processing in the think alouds they produced after the
longer first-pass reading times.1 An interesting secondary finding was that verbal
evidence of deeper processing coincided with elevated reading times, but not
with the presence of task-relevant information per se (i.e., not all sentences about
the target disease elicited deep processing). This would seem to suggest that the
longer eye fixation durations were the factor that mediated between text infor-
mation and the participants’ depth of processing.
Think-alouds are a versatile research methodology (see Fox et al., 2011, for a
recent review and meta-analysis). Within SLA research, think alouds have been
collected to study questions pertaining to noticing and awareness (Alanen, 1995;
Godfroid & Spino, 2015; Leow, 1997, 2000; Rosa & Leow, 2004; Rosa & O’Neill,
1999) the processing of feedback during writing, including “noticing the gap”
(Qi & Lapkin, 2001; Sachs & Polio, 2007; Sachs & Suh, 2007; Swain & Lapkin,
1995); depth of processing (Leow, Hsieh, & Moreno, 2008; Morgan-Short, Heil,
Botero-Moriarty, & Ebert, 2012); strategy use in vocabulary acquisition (De Bot,
Paribakht, & Wesche, 1997; Fraser, 1999; Fukkink, 2005; Nassaji, 2003) and test
taking behavior (Cohen, 2006; Green, 1998). The prevalent view is that think-
aloud protocols reflect the contents of the speaker’s short-term memory, which
are believed to be conscious (Erricson & Simon, 1993; Pressley & Afflerbach,
1995). Therefore, unlike the other online methodologies reviewed in this chapter,
which also capture unconscious processes—some would even claim SPR, eye
tracking, and ERPs capture only unconscious processes (Clahsen, 2008; Keating &
Jegerski, 2015; Marinis, 2010; Tokowicz & MacWhinney, 2005)—it is important
to emphasize that think alouds measure primarily conscious processes that involve
information in the speaker’s awareness (Godfroid, Boers, & Housen, 2013).
Leow et al. (2014) reviewed research on the early stages of L2 learning that
relied on either think-alouds, eye tracking, or reaction time (RT) measurements.
Of these three methodologies, think alouds emerged as the only measure that
can differentiate between levels of awareness and depth of processing. In contrast,
Leow et al. concurred with Godfroid et al. (2013) that eye tracking may provide
“the most robust measure of learner attention” (Leow et al., 2014, p. 117). Not
surprisingly, Leow and his colleagues also found RTs shared many properties with
eye-movement data—eye-movement data being a special type of RTs—but noted
RT tasks were less expensive and easier to implement than eye tracking. Thus, a
major strength of think alouds compared to other online methodologies is that
think alouds can illuminate the how and why of (conscious) processing (Leow et
al., 2014), whereas I would argue the same information is represented differently
in eye-movement records and is represented only to a limited extent in RT tasks.
Think alouds differ from eye tracking and ERPs, but not from SPR, in that
they require participants to engage in an additional task: speaking out loud, in the
case of think alouds, and pressing a button for SPR. Secondary tasks like these are
sometimes criticized because they carry the risk of altering participants’ cognitive
processes and changing the primary task under investigation. For think-aloud meth-
odology, this issue is known as reactivity (see Leow & Morgan-Short, 2004); it can
compromise the internal validity of a study, because researchers may no longer be
studying the cognitive process they intended to study. For instance, participants may
perform a task more analytically or with greater focus when they are asked to think
aloud concurrently.The potential reactivity of think alouds has enjoyed a good deal
of research attention in SLA and was the object of a meta-analysis in Bowles (2010).
Using a sample of 14 primary studies, Bowles analyzed how task and participant fac-
tors influenced the reactivity of thinking aloud with verbal primary tasks. She found
an overall “small effect” (p. 110) for think alouds, although this effect differed as a
function of type of verbal report, L2 proficiency level, and the primary task. Bowles
concluded that “the answer to the question of reactivity and think-alouds is not a
simple ‘yes’ or ‘no’ but rather is dependent on a host of variables” (p. 110).
Godfroid and Spino (2015) revisited the reactivity question for think alouds
and extended the concept to eye-tracking methodology. By investigating the
reactivity of eye tracking, the authors tested the widely held assumption that
reading with concurrent eye-movement registration is representative of natural
reading, a claim that had not been evaluated empirically since Tinker’s (1936)
study. Participants in Godfroid and Spino were English college majors at a Belgian
university, who had an upper-intermediate to advanced English proficiency level.
The participants read 20 short English texts embedded with pseudo words in an
eye-tracking, a think-aloud, or a silent control condition. Using both traditional
statistical tests and equivalence tests, Godfroid and Spino found converging evi-
dence that thinking aloud or eye tracking did not affect the learners’ text com-
prehension (see Figure 1.1), which was consistent with the results of Bowles’s
FIGURE 1.1 Converging evidence from traditional null hypothesis significance testing and equivalence tests that neither think alouds nor eye
tracking affect reading comprehension.
(Source: Adapted from Godfroid & Spino, 2015).
(2010) meta-analysis for think alouds. However, thinking aloud had a small, posi-
tive effect on participants’ posttest recognition of the pseudo words, while the
results for eye tracking were mixed. These findings lend some empirical support
to the claim that eye tracking is “considered to be the closest experimental paral-
lel to the natural reading process” (Cop, Drieghe, & Duyck, 2015, p. 2) although
much more research on the potential reactivity of eye tracking, following similar
work for think alouds, is needed.
1.1.2 Self-Paced Reading
Of all online methodologies, SPR—the self-paced reading of sentences that are
broken down and presented in separate segments—is the one that resembles eye
tracking most closely. Proponents of SPR highlight the practicality of the method
and its fitness-for-purpose ( Jegerski, 2014; Mitchell, 2004). Mitchell (2004) put it
most strongly, when he questioned researchers’ natural tendency to go for “nuclear
weaponry” (p. 15) when choosing a research methodology even though “a simple
and apparently crude method is often all that is needed” (ibid.).
Participants in a SPR experiment read sentences or short paragraphs in a
word-by-word or phrase-by-phrase fashion. A new segment appears and, in cur-
rent versions of the paradigm, the previous segment disappears each time the
participant presses a button. Because the participant controls the text presentation
rate, reading is said to be self-paced or subject-paced, unlike rapid serial visual
presentation (e.g., Aaronson & Scarborough, 1977; Forster, 1970; Potter, 1984),
where every word remains on the screen the same amount of time and presen-
tation is researcher- or experimenter-paced. In the 40-year existence of SPR,
researchers have experimented with different versions of the paradigm: centered
or linear text presentation and, in the case of linear presentation, cumulative or
non-cumulative formats. Figure 1.2 displays an example sentence from Hopp’s
(2009) SPR study, (a) with the original linear, non-cumulative display, (b) a linear,
cumulative display, and (c) using a centered format (always non-cumulative).
In a classic study, Just, Carpenter, and Woolley (1982) compared the button-
press times obtained under each of the previously mentioned presentation modes
with the eye gaze durations recorded with an eye tracker for the same texts.
(The eye-tracking data were the object of another oft-cited study, namely Just
and Carpenter [1980]). A total of 49 undergraduate students across the two stud-
ies read 15 short scientific texts in their native language (L1), English, follow-
ing either an eye-tracking or a SPR procedure. Just et al., (1982) modeled the
relationship between ten word- and text-level variables, and the reading times
within each data collection method. They found that the non-cumulative, linear
SPR data resembled the gaze data from eye tracking most closely, in that the
word- and text-level properties influenced both data types in a broadly simi-
lar way (e.g., increased times for word length and shorter times for word fre-
quency; see Table 1.1 and Chapter 2). Centered SPR also reproduced most of the
(a) Non-cumulative, linear display, also known as the moving window technique
___ ______ ____ ___ ______ __ _______ ___ _______ ________ ____
Ich glaube ____ ___ ______ __ _______ ___ _______ ________ ____
___ ______ dass ___ ______ __ _______ ___ _______ ________ ____
___ ______ ____ den Läufer __ _______ ___ _______ ________ ____
___ ______ ____ ___ ______ am Sonntag ___ _______ ________ ____
___ ______ ____ ___ ______ __ _______ der Trainer ________ ____
___ ______ ____ ___ ______ __ _______ ___ _______ gefeiert ____
___ ______ ____ ___ ______ __ _______ ___ _______ ________ hat.
(b) Cumulative, linear display

___ ______ ____ ___ ______ __ _______ ___ _______ ________ ____
Ich glaube ____ ___ ______ __ _______ ___ _______ ________ ____
Ich glaube dass ___ ______ __ _______ ___ _______ ________ ____
Ich glaube dass den Läufer __ _______ ___ _______ ________ ____
Ich glaube dass den Läufer am Sonntag ___ _______ ________ ____
Ich glaube dass den Läufer am Sonntag der Trainer ________ ____
Ich glaube dass den Läufer am Sonntag der Trainer gefeiert ____
Ich glaube dass den Läufer am Sonntag der Trainer gefeiert hat.
(c) Centered display, also known as the stationary window technique
Ich glaube
dass
den Läufer
am Sonntag
der Trainer
gefeiert
hat.
FIGURE 1.2
Three presentation formats in self-paced reading. Note: Ich glaube dass
den Läufer am Sonntag der Trainer gefeiert hat, “I believe that the trainer
celebrated the runner last Sunday.”
TABLE 1.1 Predicted reading times per word in four reading methods
Processing stage Variable Regression weight (ms)

Eye gaze Moving Centered format Cumulative
window
Encoding and Number of letters 32 15 15 10
lexical access Log frequency 33 15 20 −3
Beginning of line 16 53 61 50
Novel word 692 1369 1587 478
Digits 21 27 100 5
Semantic and Head noun −10 3 2 7
syntactic modification
analysis
Text integration Last word in 41 403 384 144
sentence
Last word in 154 719 −277 635
paragraph
First mention of 184 342 485 48
topic
First content word 67 94 529 194
Regression −2 289 333 381
intercept
R2 value .79 .56 .45 .39
Source: Adapted from Just et al. (1982).
word- and text-level effects, but cumulative linear SPR did so to a lesser extent.
The reason is that some participants in the cumulative SPR condition pressed
the button multiple times at once to display longer stretches of text, which dis-
rupted the link between the reading time data and ongoing cognitive processing
(also see Fernanda Ferreira & Henderson, 1990). To avoid this issue, current SPR
researchers opt for a non-cumulative, linear display or sometimes a centered dis-
play. Specifically, centered displays have had a place in L2 research when the goal
was to replicate earlier work with native speakers that had relied on centered SPR
(Roberts, personal communication, August 12, 2015). Without such a precedent
in the L1 literature, however, non-cumulative, linear SPR may be the preferred
presentation format.
Although the moving window procedure fared generally well in Just et al.’s
comparison, a few differences with eye tracking are worth pointing out. First,
goodness of fit (i.e., how well the word- and text-level variables could account
for the reading times) systematically decreased: from R2 = .79 for eye-movement
data (gaze duration), to R2 = .56 for linear non-cumulative SPR times, to R2 =
.45 for centered SPR times, and finally R2 = .39 for linear cumulative SPR times.
Second, readers in all three SPR conditions spent approximately twice as long on
a word as those in the eye-tracking group (see intercept values of 289, 333, and
381 in Table 1.1, which represent the hypothetical reading time on a 0-letter, 0 log
frequency, etc. word).The longer reading times in SPR are a constant finding in the
literature (Rayner, 1998, 2009) that have concerned some proponents of eye track-
ing. Indeed, participants in SPR experiments “have a substantial amount of ‘unallo-
cated time’ (i.e., time not used in the service of word recognition or eye movement
control)” (Clifton & Staub, 2011, p. 905) of which the nature is unknown. Finally,
compared to the statistical output for the eye-movement data, Just et al. (1982)
found that “the moving-window condition appear[ed] to decrease the size of the
word-length and word-frequency effects by a factor of two but to magnify most
other effects [e.g., word novelty, first mention of topic] by a factor of three or four”
(Just et al., 1982, p. 233, my addition). This can be seen by comparing the regres-
sion coefficients for these variables in Table 1.1. How serious these departures
from natural reading are depends on the topic of one’s study. For example, Just and
colleagues’ results do not support using SPR to study the L2 acquisition of new
lexical (and perhaps also new grammatical) forms or discourse-level phenomena,
because new forms, by definition, have a low frequency. In other research areas, the
safe option is no doubt to replicate SPR findings using eye-tracking methodology.
An area in L1 research that has relied extensively on both SPR and eye tracking
is parsing, or the real-time computation of syntactic structure. A major question in
parsing research is whether sentence processing is modular (Fodor, 1983; Frazier,
1987) or interactive (MacDonald, Pearlmutter, & Seidenberg, 1994; Marslen-Wilson
& Tyler, 1987;Tanenhaus & Trueswell, 1995); that is, whether the initial parse (analy-
sis) of a sentence involves only structural (syntactic) or both structural and non-
structural (lexical, semantic, discourse-level) information. Because of the importance
of measuring the parser’s original analysis, rather than a reanalysis, to adjudicate
between these theories, work in this area has produced a fair deal of research that has
used both SPR and eye tracking (see Ferreira & Clifton, 1986; Ferreira & Henderson,
1990;Trueswell,Tanenhaus, & Kello, 1993;Wilson & Garnsey, 2009, for examples).
TEXTBOX 1.1. PARSER

The parser is an abstraction that refers to the cognitive mechanisms in peo-
ple’s heads that carry out the syntactic analysis of a sentence.
Wilson & Garnsey (2009) reported on two reading experiments—the first with
word-by-word, non-cumulative SPR and the second with eye tracking—in
which participants read temporarily ambiguous sentences like the following:
(1) (a) The ticket agent admitted the mistake because she had been caught.
(b) The ticket agent admitted the mistake might not have been caught.
(2) (a) The CIA director confirmed the rumor when he testified before Congress.
(b) The CIA director confirmed the rumor could mean a security leak.
In both examples, the bolded noun phrase is ambiguous because it can represent
either the direct object of the main clause, as shown in (1a) and (2a), or the subject
of an embedded clause, as shown in (1b) and (2b).
Wilson and Garnsey were particularly interested in the sentences with a direct
object interpretation, which are arguably the simpler of the two possible structures.
They wanted to know whether the statistical properties of the main verb—that
is, whether for example, admit and confirm occur more often with direct objects
or with subordinate clauses—influence the speed with which readers parse the
sentence correctly. Both the SPR data and one eye-movement measure (go-past
time; see Chapter 6) suggested this was the case. Thus, their findings lent support
to interactive models of sentence processing. Syntax (a general preference for the
simpler direct-object construction) did not take precedence over verb informa-
tion in the earliest stages of processing. Although Wilson and Garnsey had already
made their point using the moving-window technique, they chose to replicate
their findings with eye tracking in an effort to separate out reanalysis effects from
a reader’s initial parse.They explained that “in part because readers cannot go back
and re-read earlier sentence regions when they encounter difficulty, self-paced
reading times are probably influenced by both initial processing difficulty and the
work done to recover from that difficulty” (p. 376).
Mitchell (2004) noted that SPR and eye tracking have provided converging
evidence about L1 sentence processing, whereby the SPR studies often precede
comparable eye-tracking research by a few years. Mitchell’s claim is yet to be evalu-
ated for L2 processing, given that, to my knowledge, no field-specific comparative
research on SPR and eye tracking is available to date. Filling this research gap will
be important for both methodological and theoretical reasons. For instance, in a
review of L2 syntactic processing research, Dussias (2010) concluded that the evi-
dence for or against structure-based processing in non-native speakers coincided
with research methodology. She noted that SPR data support Clahsen and Felser’s
(2006a, 2006b) shallow structure hypothesis that L2 speakers do not compute full
syntactic structures, whereas eye-movement research suggests the contrary (but see
Felser, Cunnings, Batterham, & Clahsen, 2012, for an apparent exception).To clarify
this issue, there is a need for comparative research on the shallow structure hypoth-
esis where the same sentences are read in a self-paced fashion and with eye tracking.
1.1.3 Eye Tracking
Eye tracking is the colloquial term used for eye-movement recordings, which are
typically (but not necessarily) made as participants perform a task on a computer
screen. The interest in eye movements dates back to the 18th century (Wade,
2007; Wade & Tatler, 2005). Given that there was no technology to record eye
movements at the time, scientists reverted to the use of afterimages (a type of

optical illusion) to infer eye movements, often in the context of studies about ver-
tigo (Wade, 2007; Wade & Tatler, 2005). The physician William Charles Wells was
a pioneer of eye-movement research who uncovered some important properties
about eye movements (Wells, 1792, 1794a, 1794b, as cited in Wade, 2007). Sadly,
his findings were largely neglected so that when scholars turned to study eye
movements in reading in the late 1800s, they painstakingly rediscovered the same
insights that Wells gained a century before (Wade, 2007; Wade & Tatler, 2005).
Rayner (1998) distinguished three waves of eye-movement research beginning
in the late 19th century: early work (1879–1920) on basic facts about eye move-
ments; the behaviorist era (ca. 1930–1958) characterized by a more applied focus;
and current eye-movement research (mid-1970s–present) initiated by the advent
of computers. Duchowski (2002, 2007) further stressed the importance of recent
interactive eye-tracking applications, such as eye typing and the use of eye gaze
in virtual reality. Whether this truly signaled the beginning of a fourth era, as he
claimed, or a new direction enabled by technological advances, remains to be seen.
One of the strong appeals of eye tracking is the versatility of the methodology.
Three major areas of eye-tracking research are scene perception, visual search,
and language processing (Rayner, 1998, 2009), but eye movement recordings also
inform our understanding of natural (everyday) tasks, problem solving, human
expertise in various domains, aviation and driving, music reading and typing, web
usability, advertising, and psychiatric disorders, among other areas. For reviews of
many of these topics, the reader is referred to Liversedge, Gilchrist, and Everling’s
(2011) edited book and to Duchowski’s review article and book (Duchowski,
2002; 2007).
The versatility of eye-movement methodology extends to its potential applica-
tions for language research. For most of its existence, eye-tracking technology was
used in conjunction with printed text only; however, a significant development
that began in earnest in the 1990s was the recording of eye movements dur-
ing listening, in a method now known as the visual world paradigm (Allopenna,
Magnuson, & Tanenhaus, 1998;Tanenhaus, Spivey-Knowlton, Eberhard, & Sedivy,
1995; but also Cooper, 1974). Furthermore, a smaller body of work deals with
eye movements in discourse (e.g., Griffin & Bock, 2000; Holšánová, 2008; Meyer,
Sleiderink, & Levelt, 1998), written language production (e.g., Chukharev-
Hudilainen, Saricaoglu, Torrance, & Feng, 2019; Gánem-Gutiérrez & Gilmore,
2018; Révész, Michel, & Lee, 2019; Wengelin et al., 2009) and dialog (e.g., Brône
& Oben, 2018; Gullberg & Holmqvist, 1999, 2006; Kreysa & Pickering, 2011;
McDonough, Crowther, Kielstra, & Trofimovich, 2015). This book will focus pri-
marily on the two oldest strands of research—eye movements during reading and
the visual world paradigm—and guide the reader through the design and analysis
of well-conceived studies in these areas.
Besides being a versatile methodology, eye tracking is also unobtrusive, as the
recording of eye movements can take place during normal task completion. For
instance, reading with eye-movement recording is said to resemble natural reading

closely (e.g., Clifton & Staub, 2011; Keating, 2014; Van Assche, Drieghe, Duyck,
Welvaert, & Hartsuiker, 2011), although the question of “how close is close?”
merits further scrutiny (Godfroid & Spino, 2015; Mitchell, 2004; Winke, 2013).
Dussias,Valdés Kroff, Guzzardo Tamargo, and Gerfen (2013) similarly highlighted
the ecological validity of the visual world paradigm as one of its strengths because
“the … method allows researchers to make inferences directly, without secondary
behavioral responses” (p. 120). A third advantage of eye tracking is its high tem-
poral and spatial resolution. Most present-day eye trackers sample the eye location
between 60 and 2000 Hz, such that the temporal accuracy of measurement ranges
from ca. 16 ms to < 1 ms (see Section 9.1.3 for more information on sampling
speed). The high temporal resolution, combined with the free-ranging eye move-
ments in eye-tracking experiments, make it possible to distinguish between early
and late measures of processing (e.g., Frenck-Mestre, 2005; Keating, 2014; Roberts
& Siyanova-Chanturia, 2013; Winke, Godfroid, & Gass, 2013). Distinguishing
between early and late processing can be important for theoretical reasons, as
we saw in the discussion of Wilson and Garnsey’s (2009) study in Section 1.1.2,
where the authors combined SPR and eye tracking. In sum, eye tracking will be
more sensitive than SPR for capturing effects that surface in early measures only
(Frenck-Mestre, 2005).
Some of the downsides of eye-tracking technology are its price tag and the
complexities involved in data collection and analysis (Keating, 2014). In both
of these regards, eye tracking may fall somewhere in between SPR and ERPs
(Clifton & Staub, 2011), with think-aloud data siding with SPR in these matters.
Specifically, at the time of writing, major eye trackers ranged from ca. $30,000 to
ca. $50,0002 (for more information on choosing an eye tracker, see Sections 9.2.1
and 9.2.2). In contrast, SPR experiments can be run at no cost on a personal com-
puter or an Apple Mackintosh using free programming software such as DMDX
(for PC: http://www.u.arizona.edu/~kforster/dmdx/overview.htm) or PsyScope
(for Mac: http://psy.cns.sissa.it/). Commercial software packages are also available
for a fraction of the price of an eye tracker.The most important tool for collecting
think-aloud data is a voice recorder. In comparison, Morgan-Short and Tanner
(2014) noted that an EEG amplifier system costs between $25,000 and $75,000,
to which other expenditures, such as for a recording license, need to be added.
Therefore, the level of financial investment required for setting up a research lab
will vary significantly depending on the methodology.
Due to camera set-up and calibration, eye-tracking experiments also tend to
take somewhat longer than SPR tasks, though they are much shorter than ERP
studies. Because concurrent verbalizations increase the time on task in a think-
aloud study (Bowles, 2010), comparing the time needed for data collection in
a think-aloud experiment with other methodologies is more difficult and will
depend on the focus of the study and the experimenter’s technical skill level. In
Godfroid and Spino’s (2015) reading study, data collection from the think-aloud
and eye-tracking groups took about the same amount of time, even though the
actual time on task was higher in the think-aloud group (unreported data). To
ensure comparability between the two groups, both think-aloud and eye-tracking
participants met with the researcher one-on-one in this study; however, in other
research contexts, recording think-aloud data from multiple participants at once
(e.g., in a lab) could speed things up considerably. A similar observation applies to
SPR experiments, but not to eye-tracking or ERP studies, where only the largest
research facilities have the equipment and staff to support parallel data collection
sessions.
1.1.4 Event-Related Potentials
Whereas think-alouds, SPR and eye tracking are all behavioral measures, ERP is
a brain-based method that consists of recording a participant’s electrical potentials
directly on the scalp by means of electrodes (for an illustration, see Figure 1.3).
The raw, continuous recording of electrical brain activity is called the electroen-
cephalogram, or EEG; the EEG signal is picked up through a set of 20 to 256
sensors that are embedded in a skull cap (Steinhauer, 2014). The collected signals
are amplified and preprocessed before being time-locked to a critical stimulus in
the input, such as an ungrammatical or unexpected word. The resulting, averaged
FIGURE 1.3 Sample ERP waveforms, illustrating different ERP effects.

(Source: Modified from http://faculty.washington.edu/losterho/erp_tutorial.htm).
waveform is known as an event-related potential (ERP). It serves as the input for

most empirical research on language processing.
The graph in Figure 1.3 represents two ERP waveforms from two hypotheti-
cal experimental conditions (e.g., grammatical vs. ungrammatical, expected vs.
unexpected, familiar vs. unfamiliar). Of interest is whether the size (amplitude)
of the peaks and valleys in the ERP waveform differs between the conditions,
whereby specific time windows in the ERP are inspected. For example, language
researchers may focus on 300–500 ms and 600–900 ms post-stimulus, because
these are the typical latencies of language-related ERP components (Steinhauer,
2014; see following). This means that the effects of unfamiliar, ungrammatical, or
otherwise marked forms tend to manifest themselves in native speakers between
300–500 ms or 600–900 ms after seeing or hearing the critical form. Significant
between-condition differences in ERP components are called ERP effects; they
support the claim that participants were sensitive to a given experimental manip-
ulation (see Luck, 2014, for a comprehensive review, and Morgan-Short & Tanner,
2014, for a good introduction).
Like eye tracking, ERPs afford fine-grained temporal information about
processing. This is because ERP waveforms represent how neural processes, as
reflected in electrical brain activity, unfold over time following the onset (begin-
ning) of a target stimulus. A further strength of ERPs in language research is
that, unlike eye-movement data, ERPs also provide information about the nature
of ongoing processing, whether lexical-semantic or syntactic (Clifton & Staub,
2011; Foucart & Frenck-Mestre, 2012). More specifically, three language-related
ERP components have been tied to lexical-semantic or syntactic processing: the
(Left) Anterior Negativity or (L)AN, the N400, and the P600 (see Morgan-Short,
Faretta-stutenberg, & Bartlett-hsu, 2015; Mueller, 2005; Steinhauer, 2014; van
Hell & Tokowicz, 2010, for L2-focused reviews). Although the meaning of these
components remains under scrutiny, and their interpretation is often refined, it is
generally believed that the (L)AN and the P600 index are indictive of (morpho-)
syntactic processes, whereas the N400 is a lexical-semantic marker. Therefore,
it is possible to investigate these ERP components for evidence of participants’
sensitivity to linguistic phenomena and infer whether said sensitivity was lex-
ical-semantic or syntactic in nature. In contrast, in eye-movement records “all
difficulty appears in the form of a slowdown and/or an increase in the likelihood
of a regressive eye movement” to an earlier part of the sentence (Clifton & Staub,
2011, p. 906); and while eye-movement researchers distinguish between early and
late processing measures, neither of these map directly onto semantic or syntactic
processing.
Two studies by Morgan-Short and colleagues (Morgan-Short et al., 2010,
2012) will serve to illustrate the richness of ERP data. Participants in Morgan-
Short et al. (2010, 2012) were native English speakers who learned an artificial
language, called Brocanto2, over three sessions on different days. The researchers
recorded participants’ brain responses (ERPs) to gender agreement and word
order violations (i) in Session 1, when participants were at a low proficiency

level, and (ii) in Session 3, when participants had become proficient users of
Brocanto2, as determined by their performance on a grammaticality judgment
test. The study design thus replicated in an accelerated fashion what happens
when people learn a natural L2 to a high level over many years. Morgan-Short
and colleagues found that the ERP responses to grammatical violations changed
with L2 proficiency: overall, participants showed N400 effects or no effects at
early stages of learning and P600 effects (with ANs for one of the structures)
at advanced levels. Therefore, even though the same grammatical violations
were tested over the course of the experiment, the participants’ neurocognitive
responses changed and became more native-like over time. Morgan-Short and
colleagues interpreted these findings as showing a qualitative shift in the underly-
ing memory systems: from lexical-semantic processing in declarative memory in
the beginning to rule-based, grammatical processing in procedural memory at
advanced stages (Ullman, 2005).
Morgan-Short et al.’s studies were conducted in the auditory modality (i.e.,
listening). When it comes to reading, eye-movement recordings do have, as an
advantage over ERPs, their compatibility with natural reading conditions. This is
because in eye-tracking studies with written texts, sentences are displayed in their
entirety, and participants can read at their own pace. In contrast, ERP research
with written language relies on word-by-word serial visual presentation, such
that reading proceeds at a fixed rate between 400 and 700 ms per word (Morgan-
Short & Tanner, 2014). This slows the reading process down by a factor of two
to four and precludes previewing the upcoming word (Baccino, 2011; Kliegl,
Dambacher, Dimigen, & Sommer, 2014) or going back in the text (Metzner,Von
der Malsburg,Vasishth, & Rösler, 2017) as readers normally do.The reason for this
design feature of ERP studies is that eye movements cause larger artifacts in EEG
recordings, which can be avoided through a central presentation format, much
like word-by-word, centered SPR, but with the additional restriction that read-
ing rate is now under the researcher’s control. For the same reason, participants
in ERP studies are asked to suppress blinks. The implication of the presentation
format for reading is that “many significant ERP differences seem to occur too
late” (Sereno & Rayner, 2003, p. 491). Specifically, at 400 ms post-stimulus, which
is when the (L)AN and N400 components tend to peak, readers have already
moved on to the next word in natural reading (Baccino, 2011; Sereno & Rayner,
2003). This led Sereno and Rayner (2003) to speculate that “the traditional ERP
components might be indicative of recurrent feedback-driven processes rather
than the first information sweep through the system” (p. 491). Dimigen, Sommer,
Hohlfeld, Jacobs, and Kliegl (2011) similarly suggested that the N400 might index
“a late, postlexical process” (p. 14).
In an attempt to address these concerns, different research teams have begun
co-registering brain potentials and eye movements simultaneously, in a technique
known as (eye) fixation related potentials ([E]FRP’s; Baccino & Manunta, 2005;
Dimigen, Kliegl, & Sommer, 2012; Dimigen, Sommer, Hohlfeld, Jacobs, & Kliegl,
2011; Hutzler et al., 2007; Kretzschmar, Bornkessel-Schlesewsky, & Schlesewsky,
2009). Like ERPs, FRPs are based on EEG recordings, but this time, the onset of
an eye fixation, rather than a stimulus event, such as the presentation of a word,
serves as the reference for aligning and averaging different EEG segments. The
resulting waveforms, therefore, reflect the neural processing that occurred after
the eyes landed in a new location (e.g., a new word) and so natural reading is
part and parcel of this new paradigm. Importantly, natural reading also makes
the paradigm technically more challenging, because eye movements cause large
artifacts in EEG recordings (see, Dimigen et al., 2011, for extended discussion).
Using FRPs, Dimigen et al. (2011) replicated the N400 effect associated with a
word’s predictability (less predictable words eliciting larger N400s) and related the
effect to first-pass reading times recorded by the eye tracker. The authors found
that at the peak amplitude of the N400 (384 ms post-fixation-onset), only 25%
of participants were still looking at the target word and the majority of these
cases were refixations rather than initial eye fixations.The number rose somewhat
when the onset of the N400 effect was used as the reference, but overall the data
made it “hard to conceive [of] the measurable neural effects of predictability as
being causal in some way for the behavioral effects, because the bulk of the pre-
dictability effects in ERPs only followed those in behavior” (Dimigen et al., 2011,
p. 14). An interesting secondary finding of Dimigen et al.’s study was that N400-
like negative brain potentials were also observed at earlier time intervals (120–160
ms post-fixation-onset), albeit not in a statistically reliable manner.
While FRPs recordings have yet to be introduced into SLA and bilingualism,
a less challenging but still informative approach is to record EEG and eye move-
ments separately, for different participants or with the same participants at different
times (Dambacher & Kliegl, 2007; Deutsch & Bentin, 2001; Foucart & Frenck-
Mestre, 2012; Sereno, Rayner, & Posner, 1998). Foucart and Frenck-Mestre (2012)
reported on three ERP experiments and one eye-tracking experiment conducted
at different times with the same participants. Foucart and Frenck-Mestre revisited
the question of whether late L2 learners can acquire grammatical features—in
this case, noun-adjective gender agreement—in the absence of similar features in
the participants’ L1. The stimuli for the study were French sentences in which the
noun and adjective either agreed or did not agree in grammatical gender.Various
syntactic manipulations made it progressively more difficult to process agreement.
The ERP data revealed that L1 French speakers showed P600 effects (i.e., syntac-
tic reflexes to agreement violations) regardless of where the adjective occurred in
the sentence. Among the L2 French speakers, however, agreement violations elic-
ited a somewhat atypical P600 effect in the easiest condition, an N400 effect in the
intermediate condition, and no consistent response in the most difficult condition.
To clarify the null result in Experiment 3, which tested processing in the
most difficult condition, Foucart and Frenck-Mestre reran the same sentences
by the same participants after a few months, but this time using eye tracking and
natural reading conditions. Unlike in the ERP experiment, the English learners of
French now showed significant and comparable levels of grammatical sensitivity
to native speakers in both early and late reading measures.The learners spent more
time reading ungrammatical than grammatical adjectives. These results generally
supported late L2 learners’ ability to acquire grammatical gender. From a meth-
odological viewpoint, the contrasting findings of the ERP and the eye-tracking
experiment point to potentially reactive effects of word-by-word serial visual
presentation. As Foucart and Frenck-Mestre (2012) observed, serial visual pres-
entation might make higher demands on memory than natural reading, which
could prove especially taxing for L2 speakers, whose memory may already be
taxed more (McDonald, 2006).
1.1.5 Synthesis
In this section, I situated eye-tracking methodology among other online, or
real-time, methodologies that have gained currency in the last two decades of
SLA research (also see Sanz, Morales-Front, Zalbidea, & Zárate-Sández, 2016).
Following a brief introduction of each methodology, I focused on how think
alouds, SPR, and ERPs relate to eye tracking, in the belief that such contrasts aid
the understanding of what eye tracking is and is not, and how it can enrich one’s
research program. Table 1.2 summarizes the main themes that emerged from this
overview and rounds them off with some extra information.
Of the methodologies reviewed in this chapter, only think alouds provide the
researcher with qualitative data, which makes them a useful tool alongside SPR,
eye tracking, or ERPs in any study with a between-subjects design (see Section
5.2). For researchers for whom it is important to obtain verbal data and quantita-
tive measures from the same participants, in a within-subjects design (see Section
5.2), non-concurrent verbal reports such as stimulated recall (Gass & Mackey,
2017) and interviews offer an alternative. Stimulated recall and interviews also
seem to be the go-to methodologies in spoken language research, just like SPR
has a sister methodology—SPL—that can serve this purpose. On the other hand,
eye tracking and ERPs allow for use with either written or spoken language,
although in the case of eye tracking, this also means a new paradigm: the visual
world paradigm (see Chapter 4). In general, eye tracking stands out for being
so versatile, a property it shares to this degree only with think-aloud protocols.
However, the think-aloud methodology and SPR are more practical and cost-effi-
cient than eye tracking and some proponents of SPR (e.g., Mitchell, 2004) would
claim that SPR meets sentence processing researchers’ needs for an online read-
ing measure. Finally, ERPs can elucidate questions about the nature of processing
(i.e., semantic or syntactic) that remain outside the purview of most behavioral
research methods. Like SPR, ERPs offer high temporal and spatial resolution, but
they sacrifice ecological validity in return. Our conclusion is three-fold: (i) there
is a greater need for methodological research that directly compares the different
TABLE 1.2 Comparison of thinking aloud, SPR, eye tracking, and ERPs
Think alouds SPR Eye tracking ERPs
Type of data Primarily qualitative: Quantitative: Quantitative: Quantitative:

verbal data dwell times fixations and saccades (eye electrical brain potentials
movements)
Task modality Primarily writtena Writtenb Written or auditory Written or auditory
Naturalness of reading Full sentences; reading Segmented text; no skips or Full sentences; some constraints Word-by-word presentation at a fixed
alternates with regressions and no preview; on text layout; no evidence rate; no skips or regressions and no
thinking aloud button presses (secondary task) of reactivity preview; risk of reactivity
(secondary task); risk control text display; risk of
of reactivity reactivity
Naturalness of n/a n/a Full sentences; listening while Word-by-word presentation at a
listening looking at a visual display fixed rate; potential issues with
sentence-level prosody
Temporal resolution Lower High: milliseconds High: milliseconds High: milliseconds
Spatial resolution Lower High: word level High: sublexical level High: word level
(but low resolution in terms of brain
source localization)
Cost Low Low Higher Higher
Major strengths Address the why of Practical; a simple solution that Natural reading and listening; Address the nature of processing
processing; versatility may fit the research purpose versatility (semantic or syntactic)
Technical Minimal programming Training recommended High extensive training necessary
sophistication
a
Auditory tasks, as found for instance in L2 interaction research, have been combined more often with retrospective verbal reports or stimulated recall (Gass & Mackey, 2017).
b
A variant of SPR, known as self-paced listening (SPL), is available to study spoken-language processing (see Papadopoulou,Tsimpli, & Amvrazis, 2014, for a methodological review).
Note: n/a = not applicable
methodologies; (ii) by triangulating different methods, researchers can offset the

limitations of any single technique; and (iii) eye tracking has a lot to offer as an
online research tool.
1.2 Why Study Eye Movements?

Applied eye-tracking researchers study eye movements because they believe eye
movements reveal information about cognition. More specifically, users of eye-
tracking technology assume that an individual’s point of gaze and eye movements
are indicative of the cognitive processes that take place during task performance,
be it reading, listening, taking a test, scanning a scene, searching a visual display, or
some other type of cognitive activity (also see Section 2.6). The reason is that eye
movements have been linked to attention, in particular the process of attentional
orienting (Wright & Ward, 2008).
Tomlin and Villa (1994) popularized the idea in SLA research that attention is
not a unitary construct, but a composite of “separate, yet interrelated networks”
(p. 183). Based on the work of their colleague Michael Posner, Tomlin and Villa
argued that these brain networks underpin the attentional functions of alertness,
orientation, and detection (also see Leow, 1998; Simard & Wong, 2001). Posner
(1980) defined orienting as “the aligning of attention with a source of sensory
input” (p. 4). Orienting can be overt (observable in body movement), covert, or
both overt and covert (e.g., Posner, 1980; Styles, 2006; Wright & Ward, 2008).
Thus, it follows that eye movements, which are a type of observable behavior,
reflect the overt alignment of an observer with a source of visual input; that is,
overt orienting, or simply overt attention. Of course, while overt orienting is
a good starting point, most applied linguists will be more interested in the atten-
tional processes that take place concurrently in the mind. Can we really say overt
attention and covert attention are related? To answer this question, we need to
make an excursion into cognitive neuroscience and psychology.
Posner and his colleagues refined a paradigm, known as location cuing or
spatial cuing, that enabled them to separate covert orienting (shift of atten-
tion without concomitant eye movement) from overt orienting experimentally.
Figure 1.4, adapted from Wright and Ward (2008), depicts the sequence of events
that make up one trial in a basic location-cuing experiment.
In a location-cuing experiment, participants press a button as soon as they
detect or recognize the target that appears on the final screen (in our example
in Figure 1.4, the filled circle is the target). Prior to target onset, a spatial cue
(the horizontal bar in Figure 1.4) serves as a signal for participants to shift their
attention to the location of the impending target. Crucially, participants must
shift attention while keeping their eye gaze fixated on the central marker on the
screen. This is very difficult because people’s natural tendency is to look at what
they are paying attention to, but with proper instructions and a few practice trials,
participants can perform this task. As a result, it becomes possible to isolate the
FIGURE 1.4
Sample trial in a location-cuing experiment. Participants must press a
button that corresponds to the side of the screen where the target appears
while keeping their eyes fixated on a central point.This is a valid-cue trial
because the cue correctly predicts where the target will appear.
effects of covert orienting, because the attentional focus travels to the periphery
(covert orienting) while the eyes remain in place (overt orienting).
Wright and Ward (2008) described how the location-cuing paradigm can be
used to perform a “cost/benefit analysis” (p. 19) of covert orienting. On valid-cue
trials, such as the one shown in Figure 1.4, participants benefit from cuing because
they are already attending to the target site by the time the target appears. This
leads to faster and more accurate responses. However, on invalid-cue trials, when
the cue and target appear on opposite sides of the screen, participants incur a pro-
cessing cost. This is because on invalid-cue trials, participants must reorient their
attention from the cued location to the target location after the target is displayed,
which slows their response times and increases error rates. Posner and colleagues
captured the facilitative and inhibitory effects of covert orienting in a spotlight
metaphor of attention. As is most apparent on valid-cue trials, “attention can be
likened to a spotlight that enhances the efficiency of detection of events within its
beam” (Posner, Synder, & Davidson, 1980, p. 172). However, the spotlight meta-
phor also “capture[s] some of the dynamics involved in disengaging, moving, and
engaging attention” (Posner & Petersen, 1990, p. 35), which become important
in invalid-cue trials.
Research using the location-cuing paradigm has shown convincingly that cov-
ert and overt attention need not coincide: when eye gaze is fixed, the attentional
focus can still shift to a different part of the visual field (see Styles, 2006, and
Wright & Ward, 2008, for a review of studies). However, disjoining covert and
overt attention during visual perception requires some effort from the individual
(Wright & Ward, 2008). More importantly, dissociations of covert and overt atten-
tion have been demonstrated in the context of very simple cognitive activities,
such as cue detection, but may be harder to engineer in the context of more com-
plex cognitive tasks such as reading or looking-while-listening (Rayner, 1998,
2009). Looking ahead to the following chapters, the two most influential models
of eye movements in reading only allow for small dissociations between attention
and eye gaze, although it is disputed whether attention is allocated sequentially, to
one word at a time or in parallel, like a gradient (see Section 2.6). Similarly, covert
attention is hypothesized as the link between language and overt eye movements
in theoretical models of the visual world paradigm (see Section 4.1).
Wright and Ward (2008) reviewed three proposals about how eye movements
and covert shifts of attention relate: independent, one common system, or inter-
dependent systems. Because there is a large degree of overlap in the brain areas
underlying covert attention and eye movements (Corbetta, 1998; Corbetta &
Shulman, 1998; Grosbras, Laird, & Paus, 2005), the two mechanisms are likely to
be related to some extent, in line with the common system and interdependent
systems accounts. However, there is as of yet no consensus as to the strength of
this relationship. Because eye movements take about 220 ms to plan and execute
(Wright & Ward, 2008), whereas covert attention shifts are faster, attention will
arrive before the overt eye gaze when both are shifted to the same location (Wright
& Ward, 2008).This gives rise to preview effects in reading; that is, the finding that
processing of the next word in a text begins before the eyes have landed on it
(see Figure 1.5A and Textbox 2.1 following). However, proponents of a com-
mon-system account go one step further by positing a causal relationship between
eye movements and attention shifts. This is the central claim of Rizzolatti and
colleagues’ pre-motor theory (Rizzolatti, Riggio, & Sheliga, 1994; Rizzolatti,
Riggi, Dascola, & Umiltá, 1987; Sheliga, Craighero, Riggio, & Rizzolatti, 1997).
The argument is that covert attention is a side-effect of the motor programming
involved in preparing a saccade: it helps encode the spatial distance that the eye
needs to travel during an eye movement. Simply put, if the eyes are moving, the
brain needs to know where the eyes are moving to (spatial location). Shifts in
covert attention may help to encode the destination of the following saccade. In
this view, then, attention shifts are “planned-but-not-executed saccades” (Wright
& Ward, 2008, p. 195). Although pre-motor theory shares many characteristics
with an influential model of eye-movement control (E-Z Reader, described in
Section 2.6), Wright and Ward (2008) reviewed evidence from neuroanatomical
research (e.g., microstimulation studies with monkeys), suggesting the relationship
between eye movements and attention may, in fact, be more complex.
In sum, applied eye-tracking researchers study overt orienting, as reflected in eye
movements, to learn more about covert orienting, which is the “pure” attentional
process. This is because researchers assume that overt and covert attention largely
coincide, even though there are exceptions and the details of that relationship are
FIGURE 1.5 Examples of decoupling between eye gaze and cognitive processing. (a)
Parafoveal processing: although the reader is looking at ate (A2), he or she
is processing a hamburger (A3, A4). (b) Skipping: the reader moves straight
from ate (A2) to hamburger (A4) without looking at a (A3). Researchers
believe that a is processed together with the preceding or the following
word. (c) Parafoveal-on-foveal effects: the reader looks longer at ate (B2)
because the pnzburgers (B3) has an unusual spelling. Pnzburgers influences
processing of ate even though the reader has not looked at word B3 yet.
(d) Spillover effects: The reader looks longer at after (B4) because he or
she is still processing pnzburgers.
still being investigated. In reading, covert and overt attention are decoupled during
parafoveal processing and word skipping (see (a) and (b) in Figure 1.5); in both of
these cases, a word is processed without concurrent eye fixation. Word properties
may also continue to influence processing after a word was fixated, as indicated by
spillover effects (see (d) in Figure 1.5), and could even influence processing prior to
fixation, in parafoveal-on-foveal effects (see (c) in Figure 1.5).
1.3 Summary
This chapter provided an introduction to the what, why, and how of eye-move-
ment recordings. It was argued that the turn toward eye-tracking methodology is
a part of a larger movement in SLA research that emphasizes the use of concur-
rent data collection methods. Eye-tracking researchers are interested in studying
processing as it happens, often because they assume that such a perspective is more
informative than focusing solely on test data or questionnaires. A major appeal of
eye-tracking methodology is that it lends itself to studying many different types
of questions in a relatively unobtrusive way. The financial and time investments
necessary for acquiring an eye tracker and learning how to use it are some of the
methodology’s downsides.
Eye movements are overt orienting responses that signal the alignment of
attention with the object at the point of gaze. Eye movements and (covert) atten-
tion shifts are closely linked, although the jury is still out about whether the two
are actually different expressions of one and the same underlying system. Even so,
most applied eye-tracking researchers assume that eye movements offer a window
into cognition in that, by and large, the eye gaze indicates what information is
currently activated or being processed. This is why researchers record eye move-
ments using sophisticated machines known as eye trackers. Most modern eye
trackers infer gaze position based on video recordings of a participant’s pupil and
corneal reflection. In evaluating different eye trackers, both the intended applica-
tion and the desired data quality ought to be considered.
Notes
1 Participants were prompted to say their thoughts out loud each time a red asterisk
appeared on the screen. Participants did not know whether they would have to think
aloud until they had finished reading a sentence. In this way, the authors were able
to obtain pure reading-time measures and think-aloud data without conflating the
two and, specifically, without inflating the reading times (see Godfroid & Spino, for
discussion).
2 More affordable eye trackers such as iView, Eye Tribe, and EyeSee also exist, which may
be suitable for research that requires less accuracy and precision.
2
WHAT DO I NEED TO KNOW
ABOUT EYE MOVEMENTS?
2.1 The Observer and the Visual Field

When we look around us, it seems as though everything in eyesight is crisp and
clearly perceptible. We think we have a sharp vision of all things in our visual
field (Findlay, 2004), yet contrary to our beliefs, the region of high-acuity vision
is actually restricted to a small area around the line of sight. Thus, we have a clear
vision of only a small part of what surrounds us. The visual field can be divided
into three functional regions: the fovea, the parafovea, and the periphery (see
Figure 2.1). The foveal region, spanning less than 2° of central vision, is a small
area in the center of the retina with highest visual acuity and color sensitivity
(Holmqvist et al., 2011; Rayner, Pollatsek, Ashby, & Clifton Jr., 2012).Visual acuity
(sharp vision) is guaranteed only when light projected from the object or word
falls directly on the fovea (see Figure 2.2); in other words, when the so-called
visual axis runs from the object of interest through the foveal center (Zhu & Ji,
2007). The acuity limitations of the human eye are an important reason for why
people make eye movements: by moving their eyes, people can realign the fovea
with the region of the object that requires most visual attention.
The parafoveal region subtends (i.e., covers) approximately 10° around the
point of gaze, extending out to 5° on all sides of fixation. Information in parafoveal
vision is preprocessed, as discussed in Section 1.2, which helps the observer decide
where to look next. The parafoveal region plays a role in reading studies, where
researchers invoke or study the nature of parafoveal processing and parafoveal-
preview effects.These phenomena refer to the processing of information (input)
that is represented in the parafovea, meaning a reader is not looking at the informa-
tion directly. Finally, areas of the visual field beyond the parafovea are commonly
referred to as the periphery. Limited information can be extracted peripherally.
What Do I Need to Know about Eye Movements? 25
FIGURE 2.1 Visual acuity in the fovea, parafovea, and periphery. Vision is clearest in
a small area around a person’s point of regard (the fovea) and gradually
degrades for information that is represented eccentrically.
(Source: Rayner, Schotter, Masson, Potter, & Treiman, So much to read, so little time: How do we read,
and can speed reading help? 17, 1, 4–34, copyright © 2016 by SAGE Publications, Inc., Reprinted by
Permission of SAGE Publications, Inc.).
FIGURE 2.2 The two major axes through which light travels in the eye.
Underlying these functional subdivisions of the visual field is the biological

composition of the retina.Two types of light-sensitive cells, called cones and rods,
make up the retina. Cones and rods support distinct functions. Cones help to dis-
criminate hues and are sensitive to visual details, whereas rods allow for night vision
and perception of movement (Wedel & Pieters, 2008).With the help of horizontal
26 What Do I Need to Know about Eye Movements?
cells and amacrine cells, which collect signals from cones and rods respectively,
incoming light will be converted to electrical signals and be transferred to the
brain through the optic nerve (Holmqvist et al., 2011;Wedel & Pieters, 2008).The
fovea consists mostly of cones (see Figure 2.3, solid line). However, cone density
steadily decreases away from the point of fixation with a concomitant drop in
visual acuity, while rod density increases (see Figure 2.3, dashed line). This is why,
to see a dim star at night, you often have to look slightly to the side of it, so the
light activates more rods on the edges of your retina (Springob, 2015). Conversely,
the chances of recognizing a target word at the edges of parafoveal vision (5° away
from fixation) are almost 0%: see Figure 2.3 and Rayner et al. (2012).
To describe the different subregions in the visual field, I introduced the con-
cept of degrees of visual angle (°), which is a common unit of measurement in
eye-tracking research.Vision researchers use angular units, because there is a close
correspondence between angular size and retinal image size (Drasdo & Fowler,
1974; Legge & Bigelow, 2011). Drasdo and Fowler (1974, as cited in Legge &
Bigelow, 2011) found that 1° of visual angle in central vision corresponds to 0.28
mm on the retina. To understand the concept of visual angle, one must imagine
the observer as the center point of a circle. The human vision field is ellipsoid-
shaped, as illustrated in Figure 2.4 (Burnat, 2015). It typically extends about 140°
horizontally (90° temporally and 50° nasally) and 110° vertically (50° superi-
orly and 60° inferiorly) (Spector, 1990) and one degree of visual angle equals
FIGURE 2.3 Cone and rod density in the fovea, parafovea, and periphery. Only when
people look directly at (i.e., foveate on) an object, do they have close to
100% chance of recognizing it. Note: solid line represents cones, dashed
line indicates rods, and dotted line refers to accuracy of identifying a
target word.
(Source: Adapted from Rayner et al., 2012).
Peripheral
Central
Binocular
45˚ 100˚
5˚ 10˚
M
on
oc
ul
a
r
Optic nerve
Optic tract Optic chiasm
FIGURE 2.4 A three-dimensional rendering of the ellipsoid-shaped visual field with

visual angles extending from the point of gaze outwards.
(Source: Burnat, K., 2015. “Are visual peripheries forever young?” Neural Plasticity. https://doi.
org/10.1155/2015/307929. Reprinted by Permission of Hindawi Open-Access Journals).
1/360th of the circle. Degrees of visual angle can be divided further into minutes
of arc (arcmin, ′) and seconds of arc (arcsec, ′′): 1° = 60′ = 3,600′′. Therefore, 1
minute and 1 second of arc represent 1/21,600th and 1/1,296,000th of a circle,
respectively.
How many degrees a given object subtends will depend on the object’s dis-
tance from the observer. It is good to know an object’s angular size because only
the center 2° of vision are clear. In reading studies, for instance, researchers will
often report the angular size of a letter or the number of letters per degree of vis-
ual angle (see Chapter 6) to provide readers with an understanding of how much
text can be taken in on any single fixation. To calculate angular size, formula (2.1)
can be used. Note that this formula can be applied to any type of visual informa-
tion (e.g., a letter, picture, or area on the screen, the face of an interlocutor, or a
projector screen in a classroom), as long as the size of the region and the distance
from the observer are known.

q x (2.1)
tan =
2 d
Figure 2.5 is a schematic representation of a participant in an eye-tracking experi-

ment viewing stimuli on a screen. Given a participant’s viewing distance d and
either the stimulus size x or the visual angle θ, it is possible to compute the
unknown variable.
If the angular size is of interest, the formula simplifies to (2.2) (Legge &
Bigelow, 2011). In either case, it is important that stimulus size and viewing dis-
tance be expressed in the same units of measurement (e.g., everything in mm).
57.3 × stimulus size
Angular size in degrees = (2.2)
viewing distance
Additionally, researchers often convert visual angle into more readily interpretable
metrics, such as letters, pixels, or length units (e.g., cm, mm).Table 2.1 summarizes
the degrees of visual angle that characters in Courier font (a common font in
text-based eye-tracking studies) subtend at a range of font sizes and viewing dis-
tances that are typical of eye-tracking research. Note that because text is presented
on a computer screen, the font size is somewhat larger than in print materials. For
font size, one of my collaborators converted measurements in points into mm,
following Legge and Bigelow’s (2011) formula: size in mm = (point size/2.86).
Character width was manually measured in point (pt), the smallest unit of measure
in typography, using Adobe Photoshop 7.0 and was converted to mm. Because
Courier is a fixed-width (monotype, monospaced) font, each letter occupies the
same amount of horizontal space. Fixed-width fonts are preferred in eye-tracking
research (see Section 6.2.2), because they ensure equal horizontal angular size for
all characters, and therefore afford a better control of the visual input.
As can be seen in Table 2.1, the degrees of visual angle of each letter hori-
zontally range from 0.28 to 0.57, depending on the viewing distance. Thus, 1°
of visual angle equates to two to four letters. Rayner (1986) and Keating (2014)
FIGURE 2.5 Relationship of viewing distance d, stimulus size x, and the visual angle θ.
TABLE 2.1
Degrees of visual angle of Courier font point 16–24 at common viewing
distances
Viewing Font size Font size Degrees of visual Font width Degrees of visual angle
distance (in points) (in mm) angle vertically (in mm) horizontally
(in mm)
500 16 5.59 0.64 3.88 0.44
17 5.94 0.68 3.88 0.44
18 6.29 0.72 3.88 0.44
19 6.64 0.76 3.88 0.44
20 6.99 0.80 4.59 0.53
21 7.34 0.84 4.59 0.53
22 7.69 0.88 4.59 0.53
23 8.04 0.92 4.94 0.57
24 8.39 0.96 4.94 0.57
600 16 5.59 0.53 3.88 0.37
17 5.94 0.57 3.88 0.37
18 6.29 0.60 3.88 0.37
19 6.64 0.63 3.88 0.37
20 6.99 0.67 4.59 0.44
21 7.34 0.70 4.59 0.44
22 7.69 0.73 4.59 0.44
23 8.04 0.77 4.94 0.47
24 8.39 0.80 4.94 0.47
700 16 5.59 0.46 3.88 0.32
17 5.94 0.49 3.88 0.32
18 6.29 0.52 3.88 0.32
19 6.64 0.54 3.88 0.32
20 6.99 0.57 4.59 0.38
21 7.34 0.60 4.59 0.38
22 7.69 0.63 4.59 0.38
23 8.04 0.66 4.94 0.40
24 8.39 0.69 4.94 0.40
800 16 5.59 0.40 3.88 0.28
17 5.94 0.43 3.88 0.28
18 6.29 0.45 3.88 0.28
19 6.64 0.48 3.88 0.28
20 6.99 0.50 4.59 0.33
21 7.34 0.53 4.59 0.33
22 7.69 0.55 4.59 0.33
23 8.04 0.58 4.94 0.35
24 8.39 0.60 4.94 0.35
noted that 1° of visual angle will often correspond to three to four letters in
text-based eye-tracking studies, suggesting the smaller font sizes in this chart are
somewhat more common.
2.2 Types of Eye Movements

Humans look around the environment to obtain high-quality visual information
about distinct objects that may help them navigate the world. To do so, people
move their eyes about three to four times a second, generating a rhythm of sac-
cades and fixations (see Figure 2.6).The study of how shifts in eye gaze contribute
to visual perception and cognition is known as the field of active vision (Findlay
& Gilchrist, 2003). Active vision pervades all domains of visual behavior, includ-
ing scene perception, visual search, and reading. In this section, I present the two
key players in active vision, namely fixations and saccades. Although fixations and
saccades are probably the most common types of eye-movement behavior, they
are by no means the only ones (Gilchrist, 2011; Krauzlis, 2013). Therefore, at the
end of this section, we will consider some lesser-known types of eye behavior;
events in the eye-movement record that language researchers typically do not
FIGURE 2.6
A sequence of fixations (circles) and saccades (lines) on a TOEFL®
Primary™ practice reading test item. These are the eye-movement data
of an eight- to ten-year-old child working through the reading test item.
Larger circles indicate longer fixations.
(Source: Ballard, 2017. Copyright © 2013 Educational Testing Service. Used with permission).
analyze, but without which humans could not perform the complex visual tasks
that they do.
Fixations are periods during which the eye is relatively still, and the individual
is looking at a specific area in the visual field. People sample (i.e., take in) the
visual environment during these periods of stillness. During most fixations, the
observer is extracting and processing information from the site where he or she is
currently looking, or foveating. This area is referred to as the point of gaze or the
point of regard. Fixation durations range from ca. 50 ms to over 500 ms (Rayner,
1998; Rayner & Morris, 1991). Fixations relate to the when aspect of eye move-
ments. This is because the duration of an eye fixation is determined by when the
system decides to initiate a new eye movement. At the same time, eye fixations
also tell us something about the where of eye movements; that is, the eyes are fix-
ated somewhere in the environment and fixation location is often taken to be
informative of ongoing cognitive processing (see Section 1.2). A large part of this
chapter will be devoted to exposing the factors that influence the when and where
of eye movements during language processing. Looking ahead, Section 2.5 will be
about the higher-level cognitive factors (frequency, predictability, and contextual
constraint) that influence fixation durations. It is argued that cognitive factors play
an important role in the when of eye movements. Section 2.4 will deal with the
lower-level, visual features of language (e.g., spacing or word length) as well as
oculomotor constraints and how these jointly determine the selection of a fixa-
tion location. Thus, this section will address the question of where the eyes look.
Active vision underscores the importance of eye movements across a range
of different tasks. While a large body of work deals with eye movements during
reading, reading is a very specific and highly specialized task. Relative to the entire
course of human evolution, reading is also a recent human accomplishment. An
interesting question, therefore, is how strongly eye movements during reading
and eye movements in other visual tasks are related. Because of the relative youth
of reading skills, Reichle et al. (2012) proposed that “the processes that guide eye
movements in other tasks … almost by definition have had to be co-opted and
coordinated (through extensive practice) to support the task of reading” (p. 176,
my emphasis). The authors demonstrated through computational modeling that
the basic assumptions of their reading model (E-Z Reader, see Section 2.6) can
be used to model fixation durations and locations across a range of non-reading
tasks. In so doing, these authors offered the first unified account of eye movements
across different tasks.
A complementary approach to computational modeling, and one that in many
ways precedes it, is the accumulation of empirical data. A good approach to disen-
tangle task-specific from general effects in eye movements is to compare viewing
from the same participants across a range of tasks. To the extent that the partici-
pants’ eye-movement metrics differ (or do not correlate) between the tasks, the
measures can be said to be domain-specific. Shared properties, in contrast, indicate
that domain-general mechanisms are at work. Luke, Henderson, and Ferreira (2015)
investigated whether young adolescents’ quality of lexical representations (as

measured by standardized word- and passage-level comprehension tests) was asso-
ciated with their fixation durations and saccade lengths across three tasks. They
found that the youngsters’ fixation durations differed more strongly between L1
reading and the other visual tasks (pseudoreading and scene search) when they
had better quality lexical representations. This meant that more skilled readers
showed larger task effects on their fixation durations, consistent with the view
that reading skill is driven by language development rather than (or in addition
to) gains in oculomotor control. At the same time, Henderson and Luke (2014),
in a similar study, found that adults’ fixation durations correlated between read-
ing and non-reading tasks, which highlighted the domain-general component of
viewing behavior. What these two studies show, therefore, is that eye-movement
behavior during language processing may reflect both linguistic (domain-specific)
and non-linguistic (oculomotor, domain-general) influences. This is important to
remember as language researchers design eye-tracking studies and interpret their
data with regard to their specific language-related questions.Table 2.2 summarizes
representative fixation durations and saccade lengths for different active vision
tasks (from Rayner, 1998). Due to the lack of similar information for L2 speakers,
the reading data should be interpreted as representing skilled, adult L1 reading,
most commonly in monolingual settings. There is a need for a systematic investi-
gation of L2 speakers’ and bilinguals’ reading behavior, so that similar benchmark
data become available for the fields of SLA and bilingualism (Winke, Godfroid,
& Gass, 2013).
In reading and visual world studies, analyses often focus heavily on fixations—
the presence and timing of fixations in visual world studies and the number and
duration of fixations (including 0 fixations, or skips) in reading research. Despite
the prevalence of fixation-based analyses, however, it is important to bear in mind
that other, saccade-based measures can also prove informative (see Chapter 7).
In particular, saccade length and regression-based measures (i.e., a type of back-
ward saccade specific to reading) can enrich our understanding of the cognitive
TABLE 2.2 The range of mean fixation durations and saccade length in different tasks
Saccade length
Fixation duration Degrees of visual angle Letters
Silent reading 225–250 2 7–9
Oral reading 275–325 1.5 6–7
Scene perception 260–330 4-5
Visual search 180–275 3
Note: Fixation duration in scene perception and visual search can vary strongly depending on the exact
nature of the task that participants are asked to perform.
(Source: Rayner, 2009).
processing of language. Thus, it is time we turn to saccades, which are the second
salient characteristic of eye-movement behavior.
“Saccade is a fancy name for eye movement” (Rayner, n.d.).The term saccade
was borrowed from French, where it means a ‘jerk’ or ‘twitch’ (Wade, 2007; Wade
& Tatler, 2005). This is an accurate designation for this type of eye movement,
considering that saccades, which occur in between two eye fixations, are very
fast, ballistic movements of the eye. They are the fastest displacements of which
the human body is capable (Holmqvist et al., 2011): fastest in people’s teenage
years (John Hopkins Medicine, 2014), slowing down with age (John Hopkins
Medicine, 2014), and potentially related to individual differences in impulsivity
(Choi,Vaswani, & Shadmehr, 2014). Saccades bring the eyes from one location to
the next to provide the cognitive system with new visual information.This is nec-
essary because the region of sharp vision is limited (see Section 2.1).Therefore, to
increase processing efficiency, humans and animals move their eyes so new infor-
mation falls on the high-acuity region of the eye, known as the fovea (also see
Figure 2.1).To foveate, then, is to look straight at a word or object so it is perceived
with the most sensitive part of the retina, which is the fovea.
Saccades can be described in terms of their amplitude, duration, velocity, acceler-
ation, and deceleration (e.g., Gilchrist, 2011): see Figure 2.7. Velocity represents the
speed and direction of movement. Saccadic velocity is expressed as degrees of visual
angle per second (°/s). Many eye trackers use the information about the velocity of
the eyes to distinguish saccades from fixations (see Section 9.1.3).Wright and Ward
(2008) report that saccades can have peak velocities of 600°/s to 1000°/s. However,
because saccades are brief (30 to 80 ms; Holmqvist et al., 2011), the actual distance
covered by the eye—that is, the saccadic amplitude—tends to be relatively small,
typically from < 1° to 15° (Gilchrist, 2011).1 For shifts in eye gaze larger than 15° or
20°, the head moves along with the eyes. The velocity of a saccadic eye movement
is not a constant, but characterized by a period of acceleration followed by decel-
eration, with peak velocity as the turning point (see Figure 2.7). Acceleration and
FIGURE 2.7 Idealized saccadic profile: eye gaze displacement, velocity, and acceleration.
(Source: Modified from Duchowski, 2007, and Holmqvist et al., 2011).

deceleration are derived mathematically from velocity (see Holmqvist et al., 2011)
and are expressed as °/s2. Saccade amplitude, duration, peak velocity, and accelera-
tion/deceleration are all positively related. This relationship is known as the main
sequence (Bahill, Clark, & Stark, 1975), a term that was borrowed from astronomy.
To the best of my knowledge, saccade properties, with the exception of regres-
sions during reading, have yet to be analyzed in SLA research. Work on L1 read-
ing development points to a promising role for saccadic amplitude (i.e., saccade
length) in particular. This line of research has revealed that as children become
more skilled readers, their fixation time decreases and saccade length increases,
meaning the children make fewer and shorter fixations on each sentence (see
Blythe, 2014; Blythe & Joseph, 2011; Reichle et al., 2013, for reviews). Thus,
saccade amplitude indexes L1 reading skill. It seems worthwhile extending the
use of the amplitude measure to L2 reading research to investigate if more fluent
L2 readers also make longer eye movements. Preliminary support for this claim
comes from a study by Henderson and Luke (2014). These authors investigated
whether saccade length and fixation duration are similar across tasks (i.e., domain
general) or task specific. A group of healthy adults completed one L1 reading
and three non-reading tasks and then repeated the same tasks two days later.
Henderson and Luke found that saccade length was stable within individuals
across time, but did not correlate between scene viewing and reading. Henderson
and Luke concluded that where individuals move their eyes to—and hence saccade
length—was task-specific, although within a given task type, their participants
(proficient L1 speakers) tended to reproduce the same type of viewing behavior.
Therefore, changes in saccade length within individuals over time could point
to changes in the cognitive processes that support their reading or other men-
tal activity. Specifically, an increase in L2 readers’ mean saccade length might be
indicative of their L2 reading development.
A noteworthy property of saccades is that the eye and the brain seemingly do not
take in any new visual information during a saccadic eye movement. This phe-
nomenon is known as saccadic suppression (Matin, 1974).When the eyes make
a saccade, the rapid motion (smearing) of the image across the retina causes dif-
ficulty with spatiotemporal integration. Consequently, the retinal image is blurred,
yet this is not what people perceive (i.e., our vision does not become blurred,
or at least we do not think it does, every time we move our eyes). The details of
the neural mechanisms of saccadic suppression are complex and remain under
investigation (e.g., Binda, Cicchini, Burr, & Morrone, 2009; Cicchini, Binda, Burr,
& Morrone, 2013; Panichi, Burr, Morrone, & Baldassi, 2012; Thiele, Henning,
Kubischik, & Hoffmann, 2002; Thilo, Santoro, Walsh, & Blakemore, 2004). One
visual factor that contributes to saccadic suppression is backward lateral mask-
ing (see Matin, 1974, for review and discussion). Backward lateral masking is the
process whereby the more stable visual input at the eyes’ new resting place (i.e.,
after the eye movement) overwrites the transient stimulation during the preced-
ing saccade. Specific brain areas further contribute to the reduced visual sensitivity
during saccadic eye movements (e.g., Binda et al., 2009; Thiele et al., 2002; Thilo
et al., 2004), such that saccadic suppression likely has both visual and neural causes.
The net result is that in spite of frequent eye movements, people do not perceive
a blur, but a stable visual world. Although visual intake is strongly limited dur-
ing saccadic eye movements, there is evidence to suggest that lexical processing
continues (Irwin, 1998;Yatabe, Pickering, & McDonald, 2009).Yatabe et al. (2009)
presented participants with short English sentences that were split into two parts.
To read the second part of the sentence, the participants had to make a long sac-
cade to the right as shown in Figure 2.8. Yatabe and colleagues designated the
last word of the first part as the target word and the first word of the second part
as the spillover word. The sentences contained a frequency manipulation, such that
the spillover word (e.g., remained) was preceded by either a high-frequency target
(e.g., prison) or a low-frequency target (e.g., hangar). Low-frequency words tend
to elicit longer fixations, which is known as a frequency effect (for further details,
see Section 2.5).When the frequency effect is attested on the following word, this
is called a spillover effect, because the frequency effect is said to spill over on the
following word.Yatabe and colleagues were primarily interested in these spillover
effects.They found that the spillover effect on remained depended on the length of
the preceding saccade. Specifically, a low-frequency target word induced a larger
fixation-time increase on the spillover word when the two words were separated
by a 10°, rather than a 40°, saccade. Yatabe and colleagues took this to mean
that participants continued processing the low-frequency target word during the
FIGURE 2.8 Short- and long-distance saccades in a reading experiment. The display

changed from (A) to (B) while participants saccaded from the target
word (e.g., prison) to the square.
(Source:Yatabe et al., 2009).
next saccade. Because longer saccades take more time to complete, less processing
remained to be done after the eyes landed on the spillover word following a 40°
saccade.
Considering that cognitive processing continues during saccades (Irwin, 1998;
Yatabe et al., 2009), the question becomes whether saccade durations should be
added to fixation durations to obtain more accurate measures of processing time.
This proposal deviates from current practice because most algorithms calculate
processing time (i.e., fixation measures such as gaze duration or total time) based
on fixation times only (see Chapter 7).The question of saccade-inclusive process-
ing time is yet to be resolved. However, findings of one study by Vonk and Cozijn
(2003) suggested that saccade-exclusive and saccade-inclusive processing times
(i.e., fixation measures that include the preceding and following saccades) yield
similar results. In Vonk and Cozijn’s study, including saccade durations in first-pass
reading times led to increases in effect of -6 ms (from -48 ms to -54 ms) and
+2 ms (from 19 ms to 21 ms). This did not change the outcome of the statisti-
cal analysis in either case. Even so, the authors maintained that saccade durations
“should be included in measures of reading time as soon as fixation durations are
accumulated, i.e., if more than one fixation is contained in the measure, because
[saccade durations] contribute to language processing time as well as fixations do”
(p. 307, original emphasis).
Although fixations and saccades will be the primary characteristics that L2
researchers focus on in eye-movement data, these events make up only part of the
eye-movement repertoire. To support vision, especially vision in naturalistic set-
tings, other types of eye movements are also necessary. Specifically,
in more naturalistic settings the participant is moving, the environment is

moving, and so are objects within that environment. As a result, in these
dynamic situations, [different types of eye movements]—saccades, smooth
pursuit, vergence, vestibulo-ocular reflex—must act together in order to
deliver visual stability and orient the fovea to regions of interest.
(Gilchrist, 2011, p. 92)
Vergence refers to the inward or outward (as opposed to parallel) rotation of the
eyes in order to focus on an object that is located at a different distance from the
two eyes (Krauzlis, 2013). The vestibulo-ocular reflex is a mechanism to com-
pensate for head movement, whereby the eyes move automatically in the direc-
tion opposite to the head movement (Krauzlis, 2013). Smooth pursuit is the
voluntary tracking of a moving visual target, such as a tennis ball flying through
the air or a cheetah running off in the wilderness. Compared to saccades, smooth
pursuit movements are slower, with reported peak velocities anywhere from
30°/s (Wright & Ward, 2008; Young & Sheena, 1975) to 90°/s (Meyer, Lasker,
& Robinson, 1985) or 100°/s (Holmqvist et al., 2011). Smooth pursuit consists
of a combination of smooth movements and catch-up saccades (Barnes, 2011;
FIGURE 2.9 Fixational eye movements during a three-second fixation. The bold line
strokes are microsaccades.
(Source: Reprinted from Engbert, R., 2006. Microsaccades: A microcosm for research on oculomotor
control, attention, and visual perception. Progress in Brain Research, 154, 177–192, with permission
from Elsevier).
Hafed & Krauzlis, 2010; Krauzlis, 2013). Current eye-tracking technology is not
yet capable of measuring them accurately.
Finally, within the class of eye fixations, there is a subcategory of miniature eye
movements, which are called fixational eye movements.What this tells us is that
the term eye fixation is a bit of a misnomer (Rayner, 1998), because what appears as
a fixation (i.e., a period of stillness) is in fact marked by miniature eye movements.
Figure 2.9 plots the data of one individual, who was asked to fixate three sec-
onds on a small marker that appeared in the center of the screen (represented in
the plot as the 0° crosshairs). This random pattern shows extensive slow, mean-
dering motions, known as drift, and very fast, tiny oscillations superimposed on
the drift, which are tremor (Engbert, 2006; Martinez-Conde & Macknik, 2007,
2011). Bolded in Figure 2.9 are three fast, linear eye movements, or microsac-
cades. Microsaccades are very short saccades, which typically span less than 1°
of visual angle and occur 1–2 times per second (Engbert, 2006). Involuntary and
unconscious, microsaccades carry an image across the retina to refresh the visual
input to the photoreceptor cells. This is necessary to counteract neural adapta-
tion; that is, without renewed visual stimulation, stationary images would quickly
fade from view (Engbert, 2006; Martinez-Conde & Macknik, 2007, 2011), much
like a frog cannot see a fly sitting still on a wall, but will spot and swallow the
insect as soon as it moves (Lettvin, Maturana, MsCulloch, & Pitts, 1968). Although
visual reactivation is the primary function of microsaccades, Martinez-Conde and
Macknik (2011) noted that microsaccades can also serve to correct prior saccades
that landed slightly off-target. In their view, microsaccades and saccades form a
continuum that is underpinned by “the same neural mechanisms” (p. 105). In
eye-movement recordings, microsaccades can be spotted when there is a very short

fixation (< 80 ms) in close vicinity of a larger one. In such cases, researchers will
often merge the two fixations or, if the software does not allow that, the researcher
may delete the smaller of the two fixations (see Section 8.2.1).
2.3 The Perceptual Span

Because of the acuity limitations of the eye (Section 2.1), humans do not extract
the same type or quality of visual information from across the environment. Several
terms have been proposed to distinguish special regions within the environment
including (i) for reading, the visual span and the perceptual span (Rayner,
1975, 1998, 2009; Rayner & McConkie, 1976); (ii) for scene perception, the func-
tional field of view or the effective field of view (Henderson & Ferreira, 2004;
Henderson & Hollingworth, 1999); and (iii) for visual search, the visual lobe and
the perceptual lobe (Findlay & Gilchrist, 2003). In general terms, information that
is represented visually within these regions is privileged because, unlike other parts
of the visual field, it can contribute to task performance. Thus, the perceptual span
(a term that is sometimes used as a cover term across domains, e.g., Rayner, 1998,
2009) is the area from which visual information is obtained during a single fixa-
tion. The perceptual span size is constrained by how the retina works (see Section
2.1) and reflects attentional processes (Findlay & Gilchrist, 2003; Henderson &
Ferreira, 1990; Pomplun, Reingold, & Shen, 2001) as well as individual differ-
ences (e.g., Choi, Lowder, Ferreira, & Henderson, 2015; Hayes & Henderson, 2017;
Veldre & Andrews, 2014). As a result, span sizes differ as a function of task and task
complexity. In visual search, for instance, between three and ten items are inspected
in parallel on any given fixation (Findlay & Gilchrist, 2003), with more difficult
searches characterized by narrower visual spans (Pomplun et al., 2001). Rayner
(2009), reviewing a study with Bertera (Bertera & Rayner, 2000), noted that visual
search could proceed optimally when all search items within a 2.5° radius around
the point of fixation were visible, which is suggestive of a 5° perceptual span. In
scene perception with object identification or memory tasks, the functional field
of view extends 4° around the point of fixation (Henderson & Ferreira, 2004),
although the global properties of a scene, such as those that allow viewers to appre-
hend the gist of a scene (i.e., its general meaning), are processed for a much larger
area (Henderson & Ferreira, 2004; Henderson & Hollingworth, 1999).
How attention shapes the perceptual span is particularly clear in reading, where
the span has been shown to be larger in the direction of reading (see Figure 2.10).
This asymmetry reflects the link and partial decoupling of covert and overt atten-
tion (see Sections 1.2 and 2.6). Specifically, the asymmetric shape of the percep-
tual span derives from the fact that a covert shift in attention to the next word
precedes a corresponding overt eye movement in reading (Henderson & Ferreira,
1990). Because our eyes tend to move in the direction of reading (e.g., left to right
in English) and because covert attention precedes overt attention, the perceptual
FIGURE 2.10
The visual field. The gray oval represents the perceptual span from
which readers extract information when processing text.
span is longer in the reading direction. Readers of alphabetic writing systems

like English, for instance, can perceive information extending up to 14–15 letter
spaces to the right of fixation and only three to four letter spaces to the left of
fixation. This is a maximum estimate; individual differences in reading proficiency
(Choi et al., 2015; Häikiö, Bertram, Hyönä, & Niemi, 2009; Veldre & Andrews,
2014; Whitford & Titone, 2015; see discussion below) and a greater foveal pro-
cessing load (Henderson & Ferreira, 1990) may result in smaller spans. Because
visual acuity decreases steadily away from the point of fixation (see Section 2.1),
three subregions can be defined within the perceptual span (Häikiö et al., 2009).
The global perceptual span or word length span is the entire region for
which readers process low-level visual information, most notably word boundaries
marked by interword spaces. This information plays an important role in helping
readers determine where to look next (Rayner, 2009).The letter feature span is a
subset of the perceptual span that extends about ten character spaces to the right of
fixation (McConkie & Rayner, 1975). It is the area from which readers can extract
information about letter features, for instance whether they are ascenders (e.g., b,
d, l) or descenders (e.g., p, y, j). Finally, the letter identity span, extending about
seven to eight characters to the right of fixation, is the smallest region within the
perceptual span (McConkie & Rayner, 1975). In this region, readers can identify
specific letters, even though they are not looking at them directly. Information
represented in the letter feature and the letter identity span facilitates lexical pro-
cessing of the upcoming word. This is known as a preview benefit or preview
advantage (see Textbox 2.1). This benefit entails that when readers can see the
next word (e.g., in a sentence) parafoveally, they fixate on it for a shorter amount of
time than if no preview or an altered preview was available (Rayner, 1998, 2009).
TEXTBOX 2.1. PREVIEW BENEFIT OR PREVIEW EFFECT

A preview benefit refers to the faster processing of a word that was previously
seen parafoveally, during a fixation on the previous word, compared to if the
word had not been seen parafoveally.
Researchers have identified word length, which is visualized by word spacing

in many of the world’s languages, as an important factor in the reading process.
For every additional letter in the word that follows, the mean landing position
of a between-word saccade shifts approximately 0.2 letters to the right (Radach,
Inhoff, & Heller, 2004; also see Section 2.4). This means that people make longer
saccades when the following word is longer. The viewing location of the next
word, which is known as the eyes’ landing position, can be predicted based on
that word’s length (see Textbox 2.2 and 2.3). Crucially, word length informa-
tion is extracted parafoveally before the saccade is initiated, on the basis of word
spacing information available in the perceptual span. A logical question, then,
is how readers of unspaced languages such as Thai and Chinese target their eye
movements, given that there is no interword spacing to guide them. As a way to
address this question, researchers have artificially inserted spaces in unspaced texts.
Results showed that Thai readers read spaced text more efficiently (Kohsom &
Gobet, 1997; Winskel, Radach, & Luksaneeyanawin, 2009) even though word
spacing is not common in Thai. The facilitative role of spacing in Chinese, on
the other hand, is not as apparent. Bai,Yan, Liversedge, Zang, and Rayner (2008)
found that spacing between Chinese words yielded similar reading speeds as nor-
mal, unspaced text, whereas space insertion between Chinese characters (i.e., not
coinciding with word boundaries) and non-word spacing disrupted the reading
process. Shen et al. (2012) further explored these issues with L2 Chinese read-
ers. Interestingly, the authors found that four groups of non-native speakers of
Chinese (American, Korean, Japanese, and Thai) read word-spaced text faster than
unspaced text.This finding held regardless of whether the readers’ native language
was alphabetic (i.e., Korean, English, and Thai) or character-based (i.e., Japanese),
and regardless of whether it included spaces (i.e., English and Korean) or not (i.e.,
Thai and Japanese). These studies suggest, then, that inserting spaces does not
harm the reading process (Bai et al., 2008) and may even speed it up for both L1
readers (Kohsom & Gobet, 1997; Winskel et al., 2009) and L2 readers (Shen et
al., 2012), provided the spaces are added in between words, as a tool to facilitate
word segmentation.
Another question regarding the perceptual span is how much information can
be extracted from a single fixation (Rayner, 2009).This is known as the size of the
perceptual span. Research on the perceptual span dates back as far as McConkie
and Rayner (1975, 1976a, 1976b), who developed a gaze-contingent moving
window paradigm to study the span. In gaze-contingent moving-window read-
ing, individuals see text through a window while the text outside of the window
is obscured with different letters (see Figure 2.11).2 By varying the size of the
window and examining how it affects reading, researchers can determine the size
of the visual span. The logic is that readers no longer slow down when the win-
dow is as large or larger than their perceptual span. Chen and Tang (1998) found
that Chinese readers can extract information from only two to three characters to
the right and one character to the left of fixation. Also using a moving-window
FIGURE 2.11
The gaze-contingent moving window paradigm. The size of the
rectangles (ten characters) represents the size of the window.
(Source: Adapted from Rayner et al., 2016).
technique, Ikeda and Saida (1978) and Choi and Koh (2009) observed larger
spans for Japanese and Korean readers. Their participants were able to process six
characters (Ikeda & Saida, 1978) or six to seven characters (Choi & Koh, 2009) to
the right of fixation, respectively.
In general, there is a tradeoff between information density in a language and
span size, such that the amount of information to be obtained from any single
fixation is about the same across languages (Rayner et al., 2012). Feng, Miller, Shu,
and Zhang (2009) noted that if words rather than letters or characters are used as
the basis of measurement, the perceptual span is the same in Chinese and English.
The perceptual span is an important construct in cross-linguistic research
because of how languages are read. Specifically, the direction of reading deter-
mines how the perceptual span extends spatially. Paterson and his colleagues
(2014) demonstrated this in a recent gaze-contingent moving-window study, in
which they measured the perceptual span of Urdu-English bilinguals. Urdu is
a Perso-Arabic alphabetic language read from right to left, which makes for an
interesting comparison with English. One notable feature of Paterson et al.’s study
is that letters outside the window were replaced with visually impaired filtered
text (see Figure 2.12), rather than different letters, to simulate visual resolution
degradation. Using the moving-window technique, the authors found that com-
pared to normal reading, the rate of processing Urdu text was the lowest when the
window was symmetric and small (0.5 degrees of visual angle to both the right
and left of fixation (.5_.5)). Reading in Urdu was the fastest when the window
was asymmetric to the left of fixation and larger (1.5_.5 and 2.5_.5). Conversely,
the same readers processing English text also read the slowest with the symmetric
window but fastest when the asymmetry of the text extended rightwards (.5_2.5).
FIGURE 2.12
Urdu and English sentences displayed in a gaze-contingent moving
window paradigm with different window sizes.
(Source: Paterson et al., 2014).
This finding is consistent with previous studies investigating leftward read-

ing languages, such as Hebrew (Pollatsek, Bolozky, Well, & Rayner, 1981) and
Arabic ( Jordan et al., 2014). Thus, Paterson and colleagues’ study shows that fairly
advanced bilingual speakers are able to adjust their perceptual span to the lan-
guage they are currently reading.
The size of the perceptual span varies not only as a function of reading direction
but also in terms of individuals’ reading skills. A good example of this comes from
a study with English-French and French-English bilinguals in Canada. Whitford
and Titone (2015) studied the role of current language exposure on the bilinguals’
perceptual span, using questionnaire data to estimate their participants’ current L1
and L2 exposure. The researchers found that bilinguals with increased current L2
reading experience were less affected by small amounts of parafoveal information in
L1 materials, mimicking a smaller perceptual span, but they were more affected by
small parafoveal windows during L2 reading. Thus, they had a larger L2 span and a
smaller L1 span. Notably, the researchers found that the size of bilinguals’ perceptual
span (between 6–10 characters) differed from that of monolinguals of alphabetic
writing systems (between 14–15 characters to the right). Taken together, the results
point to subtle differences in participants’ reading skill as a result of language expo-
sure and the number of languages they speak.The size of bilinguals’ perceptual span
may be somewhat reduced compared to that of monolingual readers as a result of

joint exposure to multiple languages in everyday life.
The participants in Whitford and Titone’s study were high-functioning bilin-
guals studying or working in a bilingual country. To my knowledge, there is no
corresponding fundamental research with adult L2 learners yet, although studies
on child L1 reading development might offer some preliminary insights (Häikiö
et al., 2009; Rayner, 1986). Häikiö et al. (2009) recruited Finnish children 8, 10,
and 12 years old along with Finnish adults to examine the development of letter
identification across age range. The findings of the study revealed that younger
readers in general have smaller letter identity spans3 than older readers. This is
likely because fixated words in the foveal region consume most of slower readers’
resources. Another interesting result is that comparatively slower second-grade
readers are less affected by a small window size (the experimental manipulation
in this moving-window study) than faster second-grade readers.This finding sug-
gests that basic processing routines, in this case automatic word decoding, go hand
in hand with reading skill. By fifth grade, readers have typically developed basic
reading strategies (Rayner et al., 2012) and differences between readers become
more quantitative than qualitative, as many parts of the reading process have
become automatized (compare Lim & Godfroid, 2015). Thus, it seems that the
perceptual span could be used as an index of reading proficiency, provided script
differences (Chen & Tang, 1998; Choi & Koh, 2009; Ikeda & Saida, 1978; Jordan
et al., 2014; Paterson et al., 2014) and the number of languages spoken (Whitford
& Titone, 2015) are taken into consideration.
2.4 Where the Eyes Move

During reading, our eyes move across the text with the eye gaze pausing on
words in between eye movements of different sizes. Although saccade length may
seem random and unpredictable, in reality, the eyes land in systematic locations
that emphasize the importance of oculomotor (mechanical) factors in reading.
Specifically, when readers first look at a word, their eyes pause between the begin-
ning and the middle of a word. This location is known as the preferred viewing
location (PVL) (Rayner, 1979). The PVL is shifted to the front of the word (e.g.,
leftward in English or rightward in Hebrew) compared to the optimal viewing
position (OVP) (Deutsch & Rayner, 1999; O’Regan & Levy-Schoen, 1987),
which is closer to word center and represents the optimal location at which a word
can be recognized the fastest (Rayner, Juhasz, & Pollatsek, 2005). When the eyes
land in the PVL, rather than the OVP, refixation is likely to occur and reading time
increases. There are several reasons this may happen, including oculomotor noise
(motor error), the eyes’ current fixation location and the length of the following
word. Deviations from the OVP are also more frequent when reading connected
text than isolated words, although the effects on reading time (refixations) tend to
be smaller during sentence reading (Vitu, O’Regan, & Mittau, 1990).
TEXTBOX 2.2. LAUNCH SITE AND LANDING SITE

When describing the trajectory of eye movements, researchers borrow terms
from aviation and space exploration, referring to the launch site and landing
site of an eye movement. The launch site is the location from which the eyes
depart when making a saccade. The landing site, then, is the destination of
a saccade.
TEXTBOX 2.3. WORD N AND WORD N + 1

Word n refers to the word on which the eyes are physically fixated (also
referred to as the foveal word n, see Section 2.1). The next word is referred to
as word n + 1. It is sometimes called the parafoveal word n + 1, as it is often
seen in parafoveal vision.
TEXTBOX 2.4. UNDERSHOOTING AND

OVERSHOOTING
Undershooting and overshooting are two terms used to describe eye move-
ment trajectories in relation to the intended landing site or location. The eyes
are said to undershoot a target when they land short of the intended location;
they overshoot a target when they go past it.
The initial landing patterns on a word reflect the influence of lower-level, visual,
and oculomotor variables (see Textbox 2.2).Whereas the eyes land in the center of
short (5-letter) words, they tend to be shifted to the front in longer words. This is
because of the limits of the perceptual span (i.e., a lower-level visual constraint; see
Section 2.3). Specifically, when the next word is long, some letters may fall outside
the perceptual span and therefore will not contribute to landing site calculations
(Balota, Pollatsek, & Rayner, 1985; McClelland & O’Regan, 1981; Rayner, Well,
Pollatsek, & Bertera, 1982). Landing sites can further be explained by the location
of the preceding fixation (i.e., the launch site), due to the so-called center-of-
gravity assumption (Vitu, 1991).Vitu explained that the closer the launching site is
to the beginning of the following word, the more likely the eyes are to overshoot
the center of the next word or even skip it (see Textbox 2.3 and 2.4). This is again
because of the amount of information represented in the perceptual span, which
will be greater if the launching site is closer. Likewise, the eyes will undershoot the
center of a word when the distance between the launch site and the beginning
of the next word is large, given that fewer letters will be detected in parafoveal
vision (termed the periphery in Vitu’s work). Figure 2.13 illustrates this with two
examples from Siyanova-Chanturia, Conklin, and van Heuven (2011), where the
preceding fixation influences the landing location on the following word. When
the launching site from the word sciences is far from the following word arts, as in
the case of (A), the following fixation falls short of the center of the following
word. On the other hand, the second fixation (refixation) on sciences in (B) is close
to arts, which may then have caused the reader to overshoot his or her target.
The influence of the current viewing location on the next saccade destination
is one reason it is important to control for (i.e., keep the same) the preceding text.
For example, Siyanova-Chanturia and colleagues compared L1 and L2 reading
patterns for frequent and infrequent binomials such as arts and sciences (frequent)
and sciences and arts (infrequent). When extracting eye-movement measures, they
defined the whole binomial phrase as a single interest area (see Figure 2.13 for a
reconstructed example). This approach was preferable to analyzing fixation times
for each noun separately because that would have introduced differences in pre-
ceding context between the nouns in the two conditions. For example, arts is
preceded by sciences and in the low-frequency condition shown in Figure 2.13,
whereas it is preceded by across in the corresponding high-frequency condition.
There are cases where it may be more justifiable to conduct a word-based anal-
ysis of two-noun sequences, even when the same noun occurs in slightly different
places in the sentence. In an incidental vocabulary learning study, Godfroid, Boers,
and Housen (2013) investigated whether strong contextual cues to the meaning
of a new word (i.e., near-synonyms) aided L2 vocabulary learning beyond simply
reading the novel words in context, with no near-synonyms supplied. Participants
read short English texts embedded with novel words, which were the targets for
learning in the study. In one condition, a contextual cue preceded the target word
(e.g., boundaries or paniplines) whereas in another condition, the cue followed the
target word (e.g., paniplines or boundaries). Despite this slight variation in word
order, the results of a word-based analysis of just the cue (i.e., boundaries) con-
verged with findings from an analysis of the whole phrase, which I reported in
FIGURE 2.13 Undershooting (A) and overshooting (B) of arts in the phrase sciences and
arts given different launch sites in the word sciences.
(Source: Siyanova-Chanturia, Conklin, & van Heuven, 2011).
my dissertation (Godfroid, 2010), but omitted from the journal article for space
reasons. The two analyses showed that learners utilized the semantic cues only
when they followed the novel word, an effect that was found in both early and
late eye-movement measures.
2.5 When the Eyes Move

Although regularities in fixation location are important for model-building pur-
poses (see Section 2.6), applied eye-tracking researchers tend to be more con-
cerned with the length of fixation on any given word.This is because eye-fixation
duration is a “processing load measure” (Tanenhaus & Trueswell, 2006, p. 875)
that can inform many of the questions L2 researchers care about, including the
effects of instruction and feedback, attention, noticing, grammatical sensitivity,
parsing preferences, and various kinds of processing difficulty (see Chapter 3 for
elaboration). Thus, by comparing eye-fixation durations in two or more condi-
tions (e.g., ungrammatical vs. grammatical, enhanced vs. unenhanced, or ambigu-
ous vs. unambiguous), researchers can determine whether participants process
the marked (experimental) condition differently from the unmarked (control)
condition. In many cases, this will help them answer their research questions.
However, when drawing such comparisons, it is important to bear in mind that
fixation duration is also influenced by other visual, lexical, or higher-order cogni-
tive variables besides the experimental manipulation. In this section, I provide an
overview of these potentially confounding variables, so researchers can manipu-
late them consciously or control for them in their study designs.
The “big three” predictors of eye-fixation duration are frequency, contex-
tual constraint or predictability, and word length (Kliegl, Nuthmann, &
Engbert, 2006, p. 13). Other potential confounds include word familiarity, age of
acquisition, part of speech, and concreteness or imageability (see Liversedge &
Findlay, 2000; Rayner, 1998, 2009; Starr & Rayner, 2001, for reviews). Table 2.3
specifies how each of the preceding variables influences fixation duration, along
with some representative publications. The table also serves to identify areas for
future research, considering not all variables have been examined with L2 speak-
ers or in the audiovisual modality.
In studying the effects of these variables, researchers have generally taken one
of two approaches, which following Kliegl, Grabner, Rolfs, and Engbert (2004)
I will refer to as experimental control and statistical control, respectively.
The oldest and most common approach has been to manipulate target words
experimentally, such that the words in a sentence differ on one dimension (e.g.,
high frequency vs. low frequency) but are matched in all other regards (e.g., same
length and predictability). This type of design lends itself to an analysis of vari-
ance (ANOVA), although other techniques—notably, linear regression (Plonsky
& Oswald, 2017) and mixed-effects regression analysis (Gries, 2015)—are also
possible (see Chapter 8). More recently, researchers have increasingly turned to
regression-based approaches, where variables are not controlled a priori in the

experimental designs, but jointly entered as predictor variables in a multivari-
ate regression model (see Boston, Hale, Kliegl, Patil, & Vasishth, 2008; Juhasz &
Rayner, 2003; for examples).These two approaches are exemplified in an oft-cited
study by Kliegl et al. (2004), who studied the effects of three variables (length,
frequency, and predictability) on young and older readers’ sentence processing.
Kliegl et al. (2004) collected eye-movement data from young and old L1
German speakers reading the 144 German sentences of the Potsdam Sentence
Corpus. The Potsdam Sentence Corpus is one of two well-known corpora for
reading research, along with the Dundee corpus (e.g., Kennedy & Pynte, 2005;
Pynte & Kennedy, 2006). Kliegl and colleagues collected predictability, frequency,
and word length information for every word in the corpus, which they could
then use in their statistical analyses. Word predictability was obtained from an
incremental cloze task (Taylor, 1953), in which a sample of participants (different
from the ones who took part in the main study) tried to guess every word from
the beginning of a sentence. Cloze tasks are the most common techniques used
to estimate the predictability of a word (but see Pynte, New, & Kennedy, 2008, for
an alternative approach in terms of Latent Semantic Analysis). Word predictability
is estimated as the percentage of participants who are able to fill out the correct
word. In (1) below, for instance, a cloze value of 0.6 for the word window can be
interpreted as that 60% of the participants predicted the word window in the sen-
tence Jill looked back through the open __________.
(1) Jill looked back through the open window to see if the man was there.
(Rayner & Well, 1996)
Word frequency information was retrieved from the CELEX corpus and entered
as log frequency. Researchers commonly use log frequency, rather than raw fre-
quency values, in their analyses to account for the fact that the frequency dis-
tribution of words in natural language is positively skewed. Furthermore, the
relationship between frequency and reaction times, of which eye fixation dura-
tions are a special case, is non-linear. Small changes at the low end of the fre-
quency scale (e.g., from 1 to 10 occurrences per million words) have a similar
effect on reaction times as much larger changes at the high end of the scale (e.g.,
from 1000 to 10,000 occurrences per million words). To account for these facts,
Kliegl and colleagues divided the words in their corpus into five “logarithmic
frequency classes: class 1 (1–10 per million): 242 words; class 2 (11–100): 207
words; class 3 (101–1,000): 242 words; class 4 (1,001–10,000): 227 words; class 5
(10,001–max): 76 words” (p. 267).
To understand how predictability, frequency, and word length affected reading
behavior, the authors conducted two types of analyses, one on the entire corpus
(the statistical control approach) and the other on a selected subset of target words
(the experimental control approach). The reason they ran two analyses is that
word length and frequency were correlated in the Potsdam Sentence Corpus
(r = -.64), as is normally the case in natural language: longer words tend to be less
frequent. Therefore, to disentangle the effects of length and frequency on read-
ing, the authors repeated the analysis on a subset of words for which length and
frequency were uncorrelated (r = -.01).
The results of the study revealed that, for the entire corpus, length and fre-
quency were related to fixation duration and fixation probability measures, while
predictability only influenced fixation probability. These findings are consistent
with previous studies that reported increases in fixation duration for longer words
and decreases for frequent words (see Table 2.3). Predictable words were skipped
more often, which also echoed findings from previous research. As for the target
word analyses, first-pass duration was influenced by all three variables.Target word
predictability, however, was strongly linked to second-pass duration; specifically,
low predictable words had higher rereading times. The comparison of effects for
corpus and target words indicated that word length and word frequency effects
generalized to all words (the regression coefficients for the predictors in both
analyses were similar). Findings for predictability generalized to fixation probabil-
ity but not fixation duration.Therefore, some caution is needed when extrapolat-
ing the findings from a tightly controlled experimental design to more naturalistic
sentence or text reading, as was observed in the corpus analysis.
The previous description provides some insights into the kinds of factors
researchers need to consider when designing their experimental materials. In so
doing, it lays the foundation for Chapters 3 and 4, which are devoted to study
design. Meanwhile, we turn to word familiarity and age of acquisition (AoA) as
two variables that are related to, yet distinct from, frequency.
In reading research, both AoA and word familiarity are often measured using
subjective ratings. AoA refers to the age at which a word was first encountered,
while word familiarity indicates the degree of familiarity with a word. Prior to the
main experiment, researchers may collect normative data of the tested variables.
Depending on whether AoA and word familiarity are the variables of interest
or are control variables, researchers aim to have either a wide range or very lit-
tle variability in the AoA and word familiarity ratings, respectively. This is what
the norming data are used for. For example, in a norming study for their second
experiment, Juhasz and Rayner (2006) measured AoA and word familiarity using
a 7-point Likert scale. Twenty undergraduate students rated words’ AoA from 1
to 7, with 1 indicating the word was acquired between age 0 to 2, and 7 reflect-
ing the word was acquired after the age of 13. Another group of undergraduates
similarly rated the words’ familiarity, with higher numbers on the rating scale
signaling a greater familiarity with the word. In a series of two experiments,
Juhasz and Rayner investigated the influence of AoA and frequency (with famili-
arity, concreteness, imageability, and length controlled) on word recognition.They
employed an experimental control approach. In Experiment 2, the participants
were exposed to sentences in four conditions (High Frequency [HF]–early AoA;
TABLE 2.3 Variables influencing fixation duration
Variable How is it measured? Influence Selected publications

Frequency Corpus More frequent R: Cop, Keuleers, Drieghe, &
words are Duyck (2015)
processed faster. L2: Godfroid et al. (2013)
Less frequent VW: Dahan, Magnuson, and
words are Tanenhaus (2001)
processed more
slowly.
Predictability Cloze Words that R:Vainio, Hyönä, and Pajunen
(contextual probabilities are more (2009)
constraint) predictable from L2: Mohamed (2015)
the preceding VW: Altmann and Kamide,
context are (2009)
processed faster.
Word length Number of letters Longer words are R: Juhasz (2008)
processed more LA: Lowell and Morris (2014)
slowly. VW: Meyer, Roelofs, and Levelt
(2003)
Word Ratings More familiar R: Williams and Morris (2004)
familiarity words are L2:—
processed faster. VW:—
Age of Corpus counts & Words that are R: Juhasz and Rayner (2006)
acquisition Ratings acquired earlier LA: Joseph, Wonnacott, Forbes,
in life are and Nation (2014)
processed faster. VW: Canseco-Gonzalez, Brehm,
Brick, Brown-Schmidt,
Fischer, and Wagner (2010)
Part of speech Verbs are processed R: Bultena, Dijkstra, and van
more slowly Hell (2014)
than nouns. L2:—
VW:—
Concreteness NS Ratings More concrete R:—
words or literal L2: Siyanova-Chanturia,
idioms are Conklin, and Schmitt (2011)
processed faster VW: Duñabeitia, Avilés, Afonso,
Scheepers, and Carreiras
(2009)
Location of Clause types Processing time R: Hirotani, Frazier, and Rayner
Clauses increases at the (2006)
ends of both L2:—
clauses and VW: n/a
sentences
Note: R = reading studies with L1 speakers or bilinguals; L2 = studies whose main focus is on late L2
learners or unbalanced bilinguals;VW = visual world studies with L1 speakers; LA = language acquisi-
tion studies with L1 speakers.
HF–late AoA; Low Frequency [LF]–early AoA; LF–late AoA), where target nouns
in the sentences reflected the frequency and AoA of a given experimental condi-
tion. Given the high correlation between frequency and AoA (high-frequency
words are acquired relatively early in life), the criteria for low and high frequency
words in the study were more lenient than in previous research (mean frequencies
HF–late AoA, 75 per million; LF–early AoA, 6 per million) because the number
of words that qualified as HF–late AoA and LF–early AoA was limited. ANOVAs
revealed that there were main effects of frequency and AoA but no interaction
effects. Late-acquired words (average ratings of 4.9) were fixated longer than the
early-acquired words (average ratings of 2.75). These effects were independent
of frequency, which also influenced fixation duration, and familiarity, concrete-
ness, imageability, and word length, which were controlled in the present study.
These interesting findings notwithstanding, employing subjective ratings of AoA
suffered from one limitation: participants relied on more sources (e.g., order of
acquisition) to rate AoA since recalling the exact age of acquisition of a word is
difficult.
More recently, Joseph et al. (2014) examined the effects of AoA, using order
of acquisition (OoA) as a “laboratory analog of AoA” (p. 245). Of interest was
whether OoA influenced novel-word processing and acquisition when total
exposure to the words was held constant. Although the participants in the study
were English native speakers reading in their L1, similar research is being con-
ducted in bilingualism and SLA (Elgort, Brysbaert, Stevens, & Van Assche, 2018;
Godfroid et al., 2018; Godfroid et al., 2013; Mohamed, 2018; Pellicer-Sánchez,
2016). Participants in this study were exposed to 16 non-existing words over
five laboratory sessions, with half the words introduced on day 1 (early OoA)
and the other half on day 2 (late OoA). At the end of the five-day experiment,
all the words had occurred in exactly 15 sentences, meaning the total frequency
of exposure was controlled for. Word length was also held constant. Results from
linear mixed models revealed both exposure and OoA effects. The reading time
decreased after each encounter with a novel word, indicating the novel words
became more familiar to the readers. Interestingly, OoA had an effect on total
reading time only in the testing phase, but not during exposure. This is surprising
given that the test sentences were presented immediately after the last exposure
phase (day 5) and, from the participants’ point of view, were indistinguishable from
the exposure sentences. What differentiated the two sentence sets, however, was
the amount of contextual support. Target words appeared in meaningful contexts
in the exposure phase, but in neutral sentences in the test phase. (The latter served
as a test of implicit learning: see Elgort et al. (2018) for an example with L2 read-
ers.) The increase in total time for the late words, therefore, suggests that process-
ing of these words still relied to a greater extent on the surrounding context, so
when that contextual support was removed in the test phase, reading times went
up. Joseph and colleagues concluded that “the early words in [their] experiment
gained a higher quality of lexical status than the late words” (p. 245).
In this section, we explored different variables influencing fixation duration

during language processing and how these variables have been studied. In the
next section, we will see how the interplay of low-level oculomotor systems (see
Section 2.4) and high-level cognition (see Section 2.5) has been conceptualized
in different models of eye-movement control.
2.6 How Tight Is the Eye-Mind Link? A Look at

Two Models of Eye-Movement Control
Eye movements during reading are a remarkable human accomplishment (Gough,
1972; Huey, 1908) because they couple cognitive and motor activity (Engbert,
Nuthmann, Richter, & Kliegl, 2005). This is to say that eye movements are influ-
enced by both cognitive and linguistic (higher-level) and visual and oculomotor
(lower-level) factors. In Sections 2.4 and 2.5, we discussed how lower-level vari-
ables influence fixation location, or the where of reading, whereas higher-level fac-
tors primarily impact fixation duration, or the when of the reading process.
We now turn to different computational models of eye-movement control and
how these models, which offer different answers to the question of an eye-mind
link, account for various empirical phenomena in reading.
A computational model of eye-movement behavior is a computer algorithm or
formal mathematical specification that was built based on large amounts of empiri-
cal data from eye-tracking experiments. These theoretical models reflect research-
ers’ best understanding of how eye movements work. At the same time, the models
themselves are a catalyst for further empirical research, as research teams seek to
confirm or disconfirm key assumptions in the different models. In reading research,
computational models of eye-movement control can be divided largely into two
classes: cognitive-control models and oculomotor models (see Figure 2.14).
Cognitive-control models posit that cognition (e.g., lexical processing) critically
influences eye movements, whereas oculomotor models see a much larger role for
low-level visual information (e.g., word length) and oculomotor factors (e.g., the
time it takes to make a saccade). These models therefore weight the importance of
lower-level and higher-level information in reading differently and consequently,
they offer different answers to the question of an eye-mind link. Specifically, pro-
ponents of cognitive control argue for a tight eye-mind coupling (i.e., a strong
eye-mind link) whereas advocates of oculomotor control argue for a weaker and
more indirect connection between eye gaze and cognitive processing (i.e., a much
weaker eye-mind link).While this captures the general idea behind the two classes
of models, there are important differences in how individual models instantiate key
concepts such as cognitive control. This will become clear when we compare the
E-Z Reader and SWIFT models following.
In the last decade, the focus seems to have shifted somewhat from whether
there is an eye-mind link to how strong the eye-mind link is and how it should
be conceptualized. A testament to this development is the 2013 special issue of
FIGURE 2.14
Models of eye-movement control.4 Note: POC (primary oculomotor
control) = models that assume low-level factors drive eye-movements,
PG (processing gradient) = models that assume attention is allocated in
a parallel fashion, SAS (sequential attention shift) = models that assume
attention is distributed serially, to one word at a time.
The Quarterly Journal of Experimental Psychology (Murray, Fischer, & Tatler, 2013)
devoted to serial vs. parallel processing in reading. The question of whether read-
ers process words serially or in parallel (explained in more detail following) applies
to cognitive control and oculomotor models alike. Although parallel processing
is perhaps associated more strongly with oculomotor models and serial process-
ing tends to be linked to cognitive-control models, serial-processing, oculomo-
tor models also exist (e.g., SERIF or the Competition/interaction model) and
parallel-processing, cognitive-control models exist as well (e.g., Mr. Chips) (Ralph
Radach, & Kennedy, 2013).This is why Jacobs (2000), Radach, Schmitten, Glover,
and Huestegge (2009), and Radach and Kennedy (2013) proposed to reclassify
eye-movement models along two axes: (i) autonomous saccade generation vs.
cognitive control and (ii) serial or parallel attention. These distinctions are neces-
sary to come to a more fine-grained understanding of how models of eye-move-
ment control differ. In particular, they can explain why the two most prominent
models of eye-movement control—E-Z Reader and SWIFT—are so different
(i.e., they fall in opposite corners of the two-dimensional space), even though
both are considered to be cognitive models (compare Figure 2.14 and 2.15).
According to SWIFT ([Autonomous] Saccade-Generation With Inhibition
by Foveal Targets), the eyes move forward through a sentence at more-or-less
fixed time intervals (Engbert et al., 2005), much as if humans had an internal
metronome or timer that dictated when the next eye movement must occur.
FIGURE 2.15 A grid of E-Z Reader and SWIFT models of eye-movement control.
This feature accounts for the autonomous saccade generation part of the SWIFT
model. At the same time, the internal metronome is subject to random and sys-
tematic noise. Furthermore, it can also be delayed when the reader experiences
difficulty processing the currently fixated word. Delaying the next eye movement
means that the current eye fixation will be longer because the eyes are staying
in place. Therefore, processing difficulty, for instance as a result of seeing a low
frequency or unpredictable word, will increase eye fixation duration through a
process of inhibition. It is this principle of foveal inhibition that makes SWIFT a
cognitive model, even though cognitive control is not the driving force behind
the eye movements, but autonomous saccade generation is.
Assuming the next saccade is programmed, where will the eyes go? Engbert and
colleagues (2005) posited that saccade targets are selected based on a competition
between words within “a dynamically changing activation field” (p. 778). The field
of activation is based on a complex mathematical model and represented visually
in Figure 2.16. It corresponds to the gray areas under the curve for the different
words and can be seen to change dynamically over time. In Figure 2.16, the thick
black line represents the eye making its way through the sentence in a sequence of
fixations (vertical lines) and saccades (horizontal lines). At most points in time more
than one word is activated and selection of the next saccade target (i.e., the end
point of the horizontal line) is determined through a competitive process between
all the currently activated words. For instance, 700 ms into the trial, vor (before, in),
Gericht (court), and nicht (not) are all activated and competing as fixation targets. Vor
eventually wins the competition when it is refixated around 800 ms post onset. An
important upshot of this theoretical view is that in SWIFT, attention is conceived as
a gradient (as opposed to a spotlight) that encompasses multiple words.The number
of co-activated words fluctuates over time, but Engbert and colleagues noted that
their results can be reproduced qualitatively by assuming concurrent activation of
three words: the currently fixated word n, the next word n + 1, and the word after
that, word n + 2 (p. 798). From a psychological viewpoint, this means that readers
FIGURE 2.16 Example of a numerical simulation of the SWIFT model.The thick black

line is the predicted reading pattern across the sentence (“Sometimes
victims do not tell the complete truth in court.”). Vertical stretches of
the line represent fixations and horizontal stretches are saccades.
(Source: Engbert, R., Nuthmann, A., Richter, E. M., & Kliegl, R., 2005. SWIFT: A dynamical model
of saccade generation during reading. Psychological Review, 112(4), 777–813, APA, reprinted with
permission, also see Figure 9.2 in Chapter 9).
are believed to attend to and process words in parallel, in a spatially distributed fash-
ion. Hence, SWIFT is a parallel-processing (or parallel-attention), cognitive
model with autonomous saccade generation.
Now what about E-Z Reader? Although E-Z Reader is also a cognitive model,
it differs from SWIFT with regard to both attention and saccade generation. In
E-Z Reader, the programming of a saccade begins, not after a set amount of time,
but when lexical access of the currently fixated word is imminent. This means that
the reader’s mental processor unconsciously makes an educated guess, based on the
information that has become available about the word form, that the retrieval of
word meaning will follow soon.Therefore, an early stage of lexical processing, called
familiarity check or L1, is the engine that moves the eyes through the text (see Figure
2.17; Reichle et al., 2013; Reichle, Pollatsek, & Rayner, 2006; Reichle, Rayner, &
Pollatsek, 1999, 2003; Reichle, Warren, & McConnell, 2009). In E-Z Reader, the
engine is cognitive (i.e., a part of word recognition and lexical processing), unlike
in SWIFT, where the engine operates largely autonomously. Furthermore, saccade
initiation is decoupled from full lexical access, or L2, because programming a saccade
takes about 150 ms to complete (Reichle et al., 2013; Reichle et al., 2012). Reading
will be more efficient if saccadic programming and lexical processing partly overlap.
Therefore, rather than proceeding serially, as was the case in the precursor Reader
model (Morrison, 1984), in the newer E-Z Reader model the lexical and oculo-
motor aspects of reading operate partly in parallel (see Figure 2.17). When lexical
access is complete, covert attention shifts to the next word (see Figure 2.17) and
overt attention (i.e., the eye gaze) normally follows soon after (for a discussion of
overt and covert attention, see Section 1.2). The time spent processing the next
word ahead of a direct eye fixation is known as a preview benefit (see Textbox 2.1).
Preview benefits result from parafoveal processing; that is, covert attention preceding
FIGURE 2.17 Schematic representation of the E-Z Reader model.The signal to move

the eyes and the signal to shift covert attention are decoupled. Newer
versions of the model also include a post-lexical processing stage, where the
eyes may make a regression if word integration takes too long.
(Source: Adapted from Reichle et al., 2003, 2009, 2012).
the eye gaze (see Section 1.2 and Figure 1.5a). It follows that attention in E-Z
Reader is word-based. It is allocated to individual words, one word at a time, and
it travels from one word to the next, like a spotlight or a beam, in a strictly serial
fashion, with lexical access as its guide. In short, E-Z Reader is a cognitive-control,
serial-attention model, in which two hypothesized stages of word recognition L1
and L2 trigger saccade initiation and attention shifts, respectively.
The question of whether words are processed serially or in parallel has occupied
a central place in contemporary reading research (Engbert & Kliegl, 2011; Reichle,
2011). As is well established in SLA circles, the nature of attention is a thorny issue.
Conceptualizations of attention in reading range from attention as a “one-word
processing beam” (Radach, Reilly, & Inhoff, 2007, p. 240) in serial-attention models
to an attentional gradient or “field of activation” (p. 241) in parallel-attention mod-
els. These questions have important implications for how information is encoded
and also concern L2 researchers. This is because many eye-tracking researchers in
bilingualism and SLA record eye movements precisely in order to study these very
phenomena—attention and processing. Furthermore, eye-movement data in our
fields are usually interpreted in relation to the region of analysis, or interest area,
for which they were observed. For example if the region of analysis is an ungram-
matical adjective, a recast, or a low-frequency noun, then eye-fixation durations
are commonly taken to reflect the processing of that adjective, recast, or noun.
Although there are exceptions, words or phrases are often the basic unit of analysis
and fixations tend to be interpreted in relation to the specific object on which
the eyes are fixated at any given time.5 This suggests most applied eye-tracking
researchers, including SLA researchers and bilingualism researchers, assume a tight
eye-mind link. In this regard, their work is probably aligned more closely with E-Z
Reader than SWIFT. At the same time, the preceding discussion has provided ample
evidence that the eyes and the mind do not always coincide (also see Figure 1.5).
There is “elasticity in the eye-mind assumption” (Murray et al., 2013, p. 417), such
that perhaps we ought to think about the eye-mind link as a “stretchy elastic band”
(Murray, 2000, p. 652). Understanding the amount of stretch, or the degree of eye-
mind decoupling becomes very important, as Murray and colleagues (2013) argued
that given enough flexibility, any serial-attention model can reproduce seemingly
parallel effects such as parafoveal processing and word n + 2 preview effects (Radach,
Inhoff, Glover, & Vorstius, 2013; Textbox 2.5). By the same token, parallel-attention
models can mimic serial processing if the span of attention is sufficiently reduced.
TEXTBOX 2.5. N + 2 PREVIEW EFFECTS

N + 2 preview effects refer to situations when a preview of word n + 2 (the
second word to the right of fixation) from word n benefits later processing of
word n + 2 (e.g., fewer or shorter fixations).
Readers who wish to examine these questions in greater depth are referred to
the literature on parafoveal-on-foveal effects (e.g., Drieghe, Rayner, & Pollatsek,
2008; Kennedy, Pynte, & Ducrot, 2002; Pynte & Kennedy, 2006; White, 2008).
Essentially, parafoveal-on-foveal effects are cases where the properties of the
upcoming word, which is seen parafoveally, influence the duration of fixations on
the currently fixated word n.The existence of such effects is uncontroversial at the
orthographic level but contested at the semantic level. If semantic parafoveal-on-
foveal effects are found to be true, they would offer strong evidence for parallel
processing.
Regardless of which theoretical view will ultimately prevail, good research
designs with appropriate control conditions will always be key to conduct-
ing valid and interpretable eye-tracking studies. As stated previously, most eye-
tracking studies in SLA and bilingualism are designed to compare eye gaze behav-
ior under different experimental conditions—ungrammatical vs. grammatical,
enhanced vs. unenhanced, ambiguous vs. unambiguous, and so on. Adequate
experimental control means these conditions differ only with regard to what the
researcher wants to study; there are no extraneous or confounding variables (see
Section 2.5 and Chapters 3 and 4). Given such a design, attentional allocation will
be similar under the different conditions. Attention may be serial in both condi-
tions or parallel in both conditions, but to a large extent these effects will cancel
each other out when drawing comparisons between conditions. What remains,
then, in the eye-movement record is primarily the effect, or signal, of the experi-
mental manipulation.Therefore, researchers can make claims about the processing
of a word or item that is currently in the reader’s eye gaze, although they cannot
rule out the possibility that neighboring words (word n + 1 and n + 2) are being
processed concurrently (Engbert et al., 2005).
2.7 Conclusion
The aim of this chapter was to provide the reader with a set of basic facts about
eye movements that are foundational to conducting research in language acquisi-
tion and processing. Many of these facts follow from the uneven layout of the
retina, with a small area of high-acuity vision in the center—called the fovea—that
is flanked by large areas of low visual resolution called the parafovea and periphery.
“The inhomogeneity of the retina and visual projections … [is] probably the
most fundamental feature of the architecture of the visual system” (Findlay &
Gilchrist, 2003, p. 2) and may be the only way a human-sized brain can combine
sharp vision with information intake from a large visual field (Findlay & Gilchrist,
2003). The inhomogeneity of the retina underlies some important phenomena,
including the perceptual span (also known as the functional field of view and the
perceptual lobe) and parafoveal processing. These phenomena were established early
on during the third wave of eye-tracking research (see Section 1.1.3), with the
development of the gaze-contingent moving-window paradigm (McConkie & Rayner,
1975, 1976a, 1976b). In gaze-contingent moving-window studies, the visual input

changes depending on the location of an observer’s eye gaze (see Figure 2.11).
This enables researchers to withhold or manipulate parafoveal information and
see how it affects task performance. In reading research, studies with gaze-con-
tingent display changes have yielded estimates of readers’ perceptual span sizes
for different scripts and languages with different reading directions. They have
also shown the perceptual span is shaped by attentional processes (Henderson
& Ferreira, 1990). It is longer in the direction of reading (e.g., 3–4 letters to the
left but up to 14 or 15 letters to the right in English) and narrower for more dif-
ficult text (Henderson & Ferreira, 1990). Span sizes also vary within and between
individuals as a function of current language exposure (Whitford & Titone, 2015)
and language proficiency (Häikiö et al., 2009; Rayner, 1986). Research on the
perceptual span in adult L2 speakers is still in its infancy (Leung, Sugiura, Abe, &
Yoshikawa, 2014), despite the measure’s obvious potential to enrich L2 reading
research by providing an independent measure of L2 reading proficiency.
Another way in which the body of work from psychology can be profitably
expanded into SLA and bilingualism is by establishing field-specific benchmarks
of eye-movement behavior for L2 speakers and bilinguals (see Cop, Drieghe, &
Duyck, 2015; Cop, Keuleers, Drieghe, & Duyck, 2015, for two exemplary studies
with bilinguals). Forty years of eye-tracking research has resulted in an excellent
understanding of representative eye behavior in terms of fixations and saccades
(see Table 2.2 and Rayner, 1998, 2009). It is also well established that fixation and
saccade characteristics are a reflection, partly of motor activity, and partly of the
specific demands of the task and of participants characteristics. For instance, the
task of silent reading—a highly constrained task with a clear semantic compo-
nent—typically involves shorter fixation durations than other information pro-
cessing tasks (Reichle et al., 2012). Individual differences also play a role (e.g.,
Henderson & Luke, 2014; Hyönä, Lorch, & Kaakinen, 2002; Hyönä & Nurminen,
2006; Luke et al., 2015; Taylor & Perfetti, 2016; Veldre & Andrews, 2014, 2015)
but our knowledge base in this area is overwhelmingly L1-focused. In practice,
individual differences researchers who work with linguistic tasks in the partici-
pants’ L1 tend not to report their participants’ knowledge of other languages, thus
treating them as de facto monolinguals even though language status is a poten-
tial confounding factor (Cop, Drieghe, & Duyck, 2015; Cop, Keuleers, et al.,
2015;Whitford & Titone, 2015). Furthermore, there is a scarcity of studies dealing
with how reading behavior changes across L2 proficiency levels (Dolgunsöz &
Sarıçoban, 2016), so that our best guesses are currently based on findings from the
child L1 reading literature. Going forward, it will be important to assess the valid-
ity of this assumption by determining the similarities and differences between
child L1 and adult L2 reading more formally. Doing so will be essential to estab-
lish the field of L2 eye-tracking research on firm footing by adding to the existing
knowledge base a set of L2-specific parameters.
Although applied eye-tracking researchers record eye movements in order

to study their participants’ cognitive processes, they cannot ignore the potential
influence of lower-level, visual and oculomotor, factors in their data. Low-level
information about word boundaries (signaled by word spacing) is processed para-
foveally (see Section 2.3) and guides the where of eye movements; oftentimes the
eyes gravitate toward the PVL or OVP in a word (see Section 2.4). Cognition
influences the time the eyes dwell on a given word or picture, as shown primar-
ily by word frequency and predictability effects (see Section 2.5). Table 2.3 listed
a number of additional cognitive factors—word familiarity, AoA, part of speech,
concreteness, and position in the sentence—and an important visual factor—word
length—that likewise influence fixation durations. These stimulus properties will
be important to control for, either experimentally or statistically, when designing
an eye-tracking study (see Chapters 3 and 4).The details of how cognition, motor
behavior, and visual constraints interact, however, are still under debate. There is a
mounting consensus that cognition (and more specifically for our fields, linguistic
processing) influences eye movements, which is good news for the readers of
this book. At the same time, leading scientists do not agree on whether atten-
tion is allocated serially or in parallel in reading, which suggests that the question
has not been settled for other visual tasks either (see Findlay & Gilchrist, 2003).
Considering the number of studies that use eye movements as a measure of (overt)
attention, it will be important to keep up with the empirical literature on serial
vs. parallel processing and follow concomitant developments in computational
modeling. Regardless of the outcomes of this debate, eye-movement recordings
have found their way into language researchers’ methodological toolkits. The best
applied eye-tracking researchers can do, therefore, is to design sound studies of
which the results are minimally affected by differing conceptualizations of atten-
tion. This means adopting a careful experimental design with adequate control
conditions. Chapters 3 and 4 present an overview of the factors that go into
designing a sound eye-tracking study.
Notes
1 Different authors provide somewhat different estimates of saccade characteristics.
Other proposed ranges of saccade velocity are 30–500°/s (Holmqvist et al., 2011),
130–750°/s Duchowski (2007) and, for peak velocity, 400–600°/s (Young & Sheena,
1975). Holmqvist et al. (2011) defined saccades as movements spanning 4–20° distance.
These authors recognized that smaller eye movements also occur, but categorized them
differently, namely as microsaccades and glissades.
2 The present discussion may remind the reader of the self-paced reading moving-win-
dow procedure, which was introduced in Section 1.1.2. Although moving-window
techniques in eye tracking and self-paced reading are similar, they differ in what causes
the window to move. As mentioned in Chapter 1, self-paced reading relies on read-
ers’ responses (button presses) for the display to change. In eye tracking, however, the
window moves along with the reader’s point of gaze on the screen.
3 As a reminder, the letter identity span is a subregion within the perceptual span where
the reader can identify specific letters.
4 1McDonald, Carpenter, and Shillcock (2005) for Serif; 2Feng (2006) for SHARE;
3
Yang (2006) for Competition/Interaction; 4Engbert et al. (2005) for SWIFT; 5Reilly
and Radach (2006) for Glenmore; 6Legge, Klitz, and Tjan (1997) for Mr. Chips;
7
Reichle, Pollatsek, and Rayner (2006) for E-Z Reader; 8Salvucci (2001) for EMMA.
5 One exception is the analysis of reading time data for a spill-over region, defined as the
word or words that follow a critical region. Researchers commonly analyze spill-over
regions to test for spill-over effects; that is, delayed effects from the critical region (see
Figure 1.5d for an example).
3
WHAT TOPICS CAN BE STUDIED
USING TEXT-BASED EYE TRACKING?
A SYNTHETIC REVIEW
Researchers are, by nature, curious. They tend to want to answer questions. Eye-
tracking researchers are no different, in that the tool they use—eye-movement
recordings—is a means that helps them answer research questions. This chapter is
about what types of questions have been successfully addressed using eye-tracking
methodology in SLA and bilingualism. Although questions can differ in their
level of granularity (see Bachman, 2005), I have attempted to cast them in gen-
eral terms, the goal being to render the breadth and diversity of contemporary
eye-tracking research. This will serve as a springboard for readers to formulate
their own research questions and kickstart their own research projects with eye
tracking.
In Section 3.1, I offer general advice on how to find a research topic. Section
3.2 represents the bulk of this chapter. It is a synthetic review of L2 eye-tracking
research with text that is organized thematically, by research strand. (A similar
synthetic review for the visual world paradigm will be presented in Section 4.2.)
I survey five research strands within the body of eye-tracking research that are
primarily text-based. A different subsection is devoted to each strand (Sections
3.2.1–3.2.5) and each subsection concludes with a list of key questions. Thus, at
the end of this chapter, readers will have a much clearer view of how their work
could fit into the current landscape of L2 eye-tracking research.
3.1 Finding a Research Topic

Although many readers will have a general idea of the type of eye-tracking
research they would like to conduct (often a continuation of their current research
agenda), there are cases where the methodology precedes the idea. This can hap-
pen when you have access to an eye tracker, but you have not really thought yet
62 Research Topics in Text-Based Eye Tracking
about how you could use it. Perhaps one of your professors or colleagues has an
eye tracker or someone in charge of a financial account gave you the welcome
news they are getting one for your program (stranger things have happened!).
Perhaps you are reading this chapter because you are wondering if eye tracking
would be worth your time and money, a good addition to your research arsenal
(for more on practical considerations regarding eye trackers, see Section 9.2.1).
Regardless of how you became interested in eye-tracking methodology, it is a
good idea to remind yourself that eye-tracking research shares with most other
types of research activity a goal to advance our knowledge of the world. This
means eye-tracking researchers build on their own and other researchers’ work
to produce knowledge that is both reproducible and generalizable. To contribute
to the field in this way, it is important to know the existing literature, because
this will help you identify interesting questions and potential topics for research.
I want to be clear that previous studies do not have to be eye-tracking studies
to be of interest to eye-tracking researchers. Ideas can come from any type of
behavioral research (e.g. experimental studies that yield accuracy data, reaction
times, or some other type of quantitative variable) and potentially observational
studies. Eye tracking can also offer a valuable perspective on many of the questions
addressed in ERP research (see Section 1.1.4). Therefore, a productive approach
for first-time eye-tracking researchers is to read existing literature “through eye-
tracking goggles”. By this I mean it can be a productive strategy to ask yourself
what, if anything, a study would gain from a replication with eye tracking. Not
all studies will gain from having eye-tracking data, but some will. And when you
find a study that does, you have a topic for your first eye-tracking project. In
Section 9.3.1, I present ten ideas for research to help you on your way. Ideas #8
and #9 are two examples from assessment (Cubilo & Winke, 2013) and vocabu-
lary learning (Lee & Kalyuga, 2011), respectively. These original studies did not
include eye-movement recordings, but they would benefit from a replication with
eye tracking. Examples from other areas of SLA research are available as well, so
the message is to read widely and keep an open mindset (put your goggles on).
Reading existing literature in this manner will train you to think deeply about
how eye tracking could enrich your research program and what questions eye
tracking can address for you.
In the remainder of this chapter I will take an approach that is complementary
to what I have described so far. I will focus on existing eye-tracking studies, not
because these are the only sources of inspiration, but because they represent well-
established areas of research where eye-movement recordings have already proven
their value. By identifying recurring research themes, I will offer one answer to
the question of what topics researchers can study using eye-movement record-
ings. The goal is to provide a list of research strands that have generated a good
amount of eye-tracking research, with the understanding that it is certainly pos-
sible to venture outside these strands based on your own reading of the literature.
Furthermore, the research methodology in these published papers is a good gauge
Research Topics in Text-Based Eye Tracking 63
of current practices in the field. Assuming eye tracking is like other areas of quan-
titative research, these practices will likely still evolve (e.g., Plonsky, 2013, 2014)
as the field of eye tracking in SLA and bilingualism continues to grow. Therefore,
I will draw on published literature in leading journals, but also offer my own
insights as I lay out methodological guidelines for doing eye-tracking research in
subsequent chapters.
3.2 Research Strands within Text-Based Eye Tracking

To gauge the state of the art in eye-tracking research with L2 speakers and bilin-
guals, I reviewed all eye-tracking studies that had been published in print or
online, as an advance online publication, in well-known SLA journals by June
2017.1 My search targeted 15 journals that regularly publish SLA research
(VanPatten & Williams, 2002) and one journal on L2 assessment, Language Testing.
The same list of SLA journals also served as a starting point for Plonsky’s (2013,
2014) methodological review papers. The current review therefore encom-
passed all eye-tracking research from the following 16 journals: Applied Language
Learning, Applied Linguistics, Applied Psycholinguistics, Bilingualism: Language and
Cognition, Canadian Modern Language Review, Foreign Language Annals, Journal of
Second Language Writing, Language Awareness, Language Learning, Language Teaching
Research, Language Testing, The Modern Language Journal, Second Language Research,
Studies in Second Language Acquisition, System, and TESOL Quarterly.
I drew up a list of keywords for the search to circumscribe the domain of
eye-tracking research in SLA or bilingualism. The keywords for text-based stud-
ies were a combination of (1) eye tracking, eye-tracking, eye movement, eye-movement,
eye gaze, or eye fixation, paired with (2) second language, foreign language, L2, adult,
bilingual, second, language, or learners. These keywords formed the basis of an online
search in four academic databases, namely Linguistics and Language Behavior
Abstracts (LLBA), PsycInfo, Education Resources Information Center (ERIC),
and Google Scholar. The initial search yielded 84 articles published in the previ-
ously mentioned journals. After confirming the eligibility of individual articles for
inclusion in this review, I assigned each article to the broad categories of either
text-based eye tracking or visual world eye tracking.
To qualify as a text-based study, a study had to have a written-language com-
ponent—words, sentences, or larger units of text on the screen. A number of stud-
ies contained both written and spoken language, thus straddling the traditional
divide between text-based research and visual world research. (A review of the
visual world paradigm will follow in Chapter 4.) I assigned all subtitle studies to
the text-based category (Bisson,Van Heuven, Conklin, & Tunney, 2014; Montero
Perez, Peters, & DeSmet, 2015; Muñoz, 2017; Winke, Gass, & Sydorenko, 2013)
because eye-movement analyses in subtitle studies have focused primarily on the
reading of the (written) subtitles or captions. I extended the list with Suvorov
(2015), a study on L2 listening assessment which, similarly to subtitles research,
involved watching a video with narrated audio (but no subtitles in Suvorov’s case).
Extensions of visual world experiments with printed words, rather than images,
on the screen were still considered visual world, in line with these studies’ self-
labeling (e.g., Tremblay, 2011). Two studies required further scrutiny. Bolger and
Zapata (2011) combined elements of text-based eye tracking and visual world
eye tracking in different parts of their vocabulary instruction research. Conversely,
Kaushanskaya and Marian’s (2007) study did not include any audio but showed all
the other characteristics of a visual world study. I categorized both studies as visual
world research (i) because of the similarities in visual display (a few large elements
on the screen) and (ii) because of the conceptual focus on interference effects (a
recurring theme in visual world studies). In sum, although the basic distinction
between text-based eye tracking and visual world eye tracking subsumed a variety
of studies in each category, it was possible to assign each study to a single category
using a relatively small set of decision criteria.
The literature search revealed there are about twice as many text-based stud-
ies (k = 52) as visual world studies (k = 32) in SLA to date. As can be seen
in Figure 3.1, the journal representation is skewed. Most eye-tracking stud-
ies with L2 speakers or bilinguals have appeared in a handful of journals, and
primarily in Bilingualism: Language and Cognition and Studies in Second Language
Acquisition. Bilingualism: Language and Cognition accounts for almost half of all
the visual world studies published in the field to date. To some extent, this may
reflect a shared interest in the bilingual lexicon by visual world researchers and
the journal’s readership; that is, many of the visual world papers that appear in
Bilingualism: Language and Cognition deal with questions of joint lexical activa-
tion of a bilingual’s two or more languages. Eye-movement research has also
FIGURE 3.1
Distribution of eye-tracking studies across 16 SLA journals. Note:
SSLA = Studies in Second Language Acquisition; VWP = visual
world paradigm.
established itself in respected journals such as Applied Psycholinguistics, Second

Language Research, Language Learning, and The Modern Language Journal, who have
published between four and 10 eye-tracking papers each. Finally, a growing
number of journals has published at least one eye-tracking study. We may see in
this diversification of publishing outlets a diversification of the topics investi-
gated with eye tracking.
We now turn to an overview of the major research strands in text-based eye
tracking.Together with my research assistant, who coded all the studies, I identified
five broad research strands in contemporary text-based eye tracking. These strands
are (1) grammar (see Section 3.2.1), (2) vocabulary and the bilingual lexicon (see
Section 3.2.2), (3) instructed SLA (see Section 3.2.3), (4) subtitle processing (see
Section 3.2.4), and (5) language assessment (see Section 3.2.5). Some studies fell
under more than one category; however, for the present purposes each study was
assigned to one category only, which was the primary research area. In what fol-
lows, I will provide an overview of the types of questions investigated in each
strand, organized by amount of research output, and starting with the largest strand.
3.2.1 Grammar
The online search revealed a total of 19 text-based eye-tracking studies that
addressed topics related to the representation, processing, and acquisition of gram-
mar (for a summary table, see Table S3.1 in online supplementary materials). In
many cases, the authors of these studies took a formal-linguistic perspective on
L2 acquisition and processing, although psychologically inclined research studies
also exist. Together with vocabulary research, which is the second largest strand,
grammar studies account for most of the sentence-processing literature, because
trials in grammar studies tend to consist of single sentences or collections of just a
few sentences, rather than longer texts.2 There are at least four different approaches
to studying grammar. These approaches differ in what manipulation is embedded
in the critical sentences (i.e., the sentences that are of interest, as opposed to filler
sentences). (i) Studies in an anomaly detection or violation paradigm rely
on sentences that contain a grammatical, semantic, pragmatic, or discourse-level
anomaly (e.g., Clahsen et al., 2013; Ellis et al., 2014; Godfroid et al., 2015; Hopp &
León Arriaga, 2016; Keating, 2009; Lim & Christianson, 2015; Sagarra & Ellis, 2013;
Zufferey et al., 2015)). (ii) Research in an ambiguity resolution paradigm uses
sentences that contain a syntactic ambiguity (e.g., Chamorro et al., 2016; Dussias
& Sagarra, 2007; Roberts et al., 2008). (iii) Researchers working in a depend-
ency paradigm create sentences that have long-distance syntactic dependen-
cies, for instance wh-questions or relative clauses in English (e.g., Boxell & Felser,
2017; Felser & Cunnings, 2012; Felser et al., 2009, 2012). Selective reviews of the
work in these paradigms can be found in Dussias (2010), Jegerski (2014), Keating
and Jegerski (2015), and Roberts and Siyanova-Chanturia (2013). (iv) A fourth
approach to studying grammar does not involve any of the above manipulations;
it is the non-violation paradigm (e.g., Godfroid & Uggen, 2013; Spinner et al.,
2013;Vainio et al., 2016). Incidentally, the non-violation paradigm is the default in
visual world research, where studies do not normally contain a grammatical anom-
aly, even when these studies aim to measure grammar knowledge (see Chapter 4).
So, what topics do researchers investigate using all these different paradigms? Not
surprisingly, there is quite a bit of variation. A major question is whether L2 speak-
ers have acquired a particular aspect of the grammar, whereby the focus is often
on morphosyntax (e.g., tense, person, number, gender, or case markings). In studies
that aim to measure learners’ internal grammar, acquisition (understood as whether
a given function is a part of the grammar or not) is operationalized in terms of
grammatical sensitivity (see Godfroid & Winke, 2015, for discussion). The assump-
tion is that when participants have knowledge of a grammatical function, they will
slow down (or react in some other manner) to forms that violate their internal
grammar. Consequently, questions of acquisition are typically addressed using an
anomaly paradigm, whereby the researcher compares processing of grammatical and
ungrammatical sentences.3 In an eye-tracking study, this means the researcher cre-
ates a grammatical and an ungrammatical version of the same item and compares
reading times and other types of eye-movement behavior for the two sentences. (In
most cases, participants will read only one of the two versions of the item, because
the design is counterbalanced; see Sections 5.2 and 6.3.1.1, for more information.)
Here are two examples from Hopp and León Arriaga (2016), who studied the pro-
cessing of case in L1 and L2 Spanish using an anomaly paradigm. In this and the
following examples, words in boldface represent the critical area(s) in the sentence.
Unless otherwise noted, participants read the sentences in regular print.
(1) (a) Federico prometió al vecino una revista sobre barcos.

(b) *Federico prometió el vecino una revista sobre barcos.
“Federico promised the neighbor a magazine about ships.”
Example (1) is a sentence pair, or doublet, with a ditransitive verb, prometer, “to
promise”. Indirect objects in Spanish are marked with a (and a + el becomes al),
so the unmarked object el vecino (“the neighbor”) in 1b is ungrammatical.
A second line of inquiry concerns the nature of the mechanisms that interface
between people’s grammar knowledge and their overt linguistic behavior. The
technical term for this is parsing (also see Textbox 1.1). Juffs and Rodríguez
(2015) likened the relationship between the grammar and the parser to two states
of a combustion engine.
The grammar is the engine at rest, not driving the vehicle, but with the
potential to do so. Parsing is the engine in motion, subject to stresses and
possible breakdowns allowable by the system, and driving production or
comprehension in real time.
( Juffs & Rodríguez, 2015, p. 15)
Researchers are interested to know, then, how L1 and L2 speakers of different

languages parse sentences, whether parsing routines transfer between a bilingual’s
two (or more) languages, and, more generally, whether L2 speakers rely on syn-
tactic information during processing as much as L1 speakers do.
Continuing Juffs and Rodríguez’ combustion engine metaphor, it is impor-
tant to note that when researchers record eye movements, they observe the
engine in motion. Never the engine at rest. This is because eye-movement data
are processing data (Godfroid & Winke, 2015) or, in generative terms, perfor-
mance data. Juffs and Rodríguez noted that “the operation of the grammar
during processing may be affected by the quality of the input, memory limita-
tions, and interference from outside influences not related to the architecture
of the grammar itself ” (p. 15). In the case of eye-movement recordings, the data
will show influences of participants’ lexical knowledge, reading skills, general
language proficiency, and working memory capacity, among other factors. The
upshot is that if the researcher finds no evidence of sensitivity, or no evidence
of structure-based processing, it may not be warranted to conclude that partici-
pants are lacking the hypothesized grammatical knowledge or parsing routines.
Absence of evidence is not evidence of absence. Rather, the difficulty of the task
may have exceeded L2 speakers’ general language abilities, such that the par-
ticipants were no longer able to display the knowledge or processing heuristics
that they possess. The findings from Hopp’s (2014) individual differences study
speak to this point (see Fender, 2003, and Hopp, 2013, for similar findings in
self-paced reading).
Hopp (2014) studied the role of working memory, L2 proficiency, and lexical
decoding in sentence processing.4 Native and L2 English participants read relative
clauses in English that contained a local ambiguity, as in (3) and (4). There were
two nouns that could potentially be the doer of the action (in [3]) or the under-
goer of the action (in [4]).Which reading is correct depends on the number of the
auxiliary verb (was/were in [3]) or the gender of the reflexive pronoun (himself/
herself in [4]). This is an ambiguity paradigm. It is used with other real-time
methodologies as well, including self-paced reading and listening, and ERPs (see
Section 1.1.4).
(3) Local disambiguation

(a) The director congratulated the instructor of the schoolboys who was
writing reports. (high attachment)
(b) The director congratulated the instructor of the schoolboys who were
writing reports. (low attachment)
(4) Nonlocal disambiguation
(a) The student had liked the secretary of the professor who had almost killed
herself in the office. (high attachment)
(b) The student had liked the secretary of the professor who had almost
killed himself in the office. (low attachment)
In an ambiguity paradigm with eye tracking, researchers compare reading times

on paired sentences that have slightly different structural properties. When sen-
tences have a clear resolution, as in (3) and (4), the researcher’s goal is to deter-
mine whether readers favor one of the two interpretations at a structural level.
Whichever sentence a reader reads faster is the one that reflects his or her internal
parsing preferences. Using this approach, Hopp sought to determine whether
L1 and L2 English readers prefer relative clauses to refer to the first noun (high
attachment) or the second noun (low attachment), or whether they show no pref-
erence. He found that L2 English speakers’ lexical decoding skills (automaticity of
lexical knowledge) interacted with how the participants parsed the sentences. All
participants favored low attachment (i.e., [3b] over [3a]) in the easier sentences,
consistent with the L1 English controls’ reading data. However, in the more dif-
ficult sentences, only the fast lexical decoders in the L2 group still displayed the
same preference for low attachment (i.e., [4b] over [4a]). Slow and mid-level
L2 decoders no longer showed a preference between the two sentence readings,
which suggested their syntactic analysis had broken down. Lexical processing,
then, is an often neglected but essential factor in sentence processing research. As
Hopp noted, “L2 readers with less automatic lexical access … do not reach the
syntactic structure building stage in online comprehension” (p. 272). Therefore,
when studying L2 syntactic processing, it is a good idea to measure participants’
vocabulary size and word retrieval fluency, so the role of these variables can be
factored into the analyses.
Lexical factors may also play a role in the finding that L2 speakers sometimes
do not display the same syntactic reflexes as L1 speakers do (e.g., Boxell & Felser,
2017; Felser & Cunnings, 2012; Felser et al., 2009, 2012). The claim that L1 and
L2 syntactic processing differ is at the heart of the shallow structure hypothesis.
This hypothesis states that “the syntactic representations adult L2 learners compute
for comprehension are shallower and less detailed than those of native speakers”
(Clahsen & Felser, 2006b, p. 32).The shallow structure hypothesis is the impetus for
most eye-tracking and self-paced reading research within a dependency para-
digm, the third paradigm for studying grammar. Similarly to the work on ambigu-
ity resolution, dependency studies use sentences of which the structural properties
have been manipulated.The sentences are often complex. Here is an example from
Boxell and Felser (2017) who complemented eye tracking during reading with an
acceptability judgment task.The following example is from the acceptability judg-
ment task. Gaps and brackets are shown for the reader’s information only.
(5) (a) Two gaps, infinitival

It was not clear which animals [the plan to look after ___ ] would
protect ___.
(b) Two gaps, finite
It was not clear which animals [the plan that looked after ___ ] would
protect ___.
In Example (5), the noun phrase which animals moved out of its canonical direct
object position to form an indirect question. According to formal-syntactic the-
ory, the noun phrase left behind two traces or gaps in its movement up the syn-
tactic tree. The first trace is at its base-generated position (the plan would protect the
animals) and the second trace is at an intermediary gap site inside the complex
subject (the plan to look after/that looked after the animals). For the sentence in (5) to
be grammatical, the complex subject must contain an infinitive rather than a finite
verb; that is (5a) is grammatical but (5b) is not (Kurtzman, Crawford, & Nychis-
Florence, 1991; Phillips, 2006, as cited in Boxell & Felser, 2017).
The wh-phrase which animals is called a filler and the empty category at its base
extraction site is a gap. Of interest is whether L2 speakers will show a syntactic
reflex in their processing behavior known as filler-gap dependency processing.
Essentially, this means whether they will slow down at gap sites (to integrate the
filler with its original location) and at the same time avoid positing gaps in places
where they are illicit. Consequently, researchers who work in a dependency para-
digm will compare reading times for sentences where a slowdown is expected
(e.g., [5a]) with reading times for sentences where a slowdown is not expected
(e.g., [5b]).
Dussias (2010) highlighted the importance of data collection method in study-
ing the processing of syntactic dependencies. Given that the sentences in depend-
ency research tend to be complex, having participants read them word-by-word
or segment-by-segment (as is the case in self-paced reading), may further increase
cognitive load. This may be especially hard for L2 speakers, for whom processing
is generally more effortful. Indeed, it is noteworthy that self-paced reading studies
have generally supported the shallow structure hypothesis (e.g., Felser, Roberts,
Marinis, & Gross, 2003; Marinis, Roberts, Felser, & Clahsen, 2005; Papadopoulou
& Clahsen, 2003), whereas eye-tracking studies have found that L2 readers do
show sensitivity to gaps, only slightly later than L1 controls (Boxell & Felser, 2017;
Felser et al., 2012). This raises the issue of lexical processing speed again (Hopp,
2014). Could subtle processing differences between L1 and L2 speakers be due
to lexical processing differences rather than syntactic ones? To conclude that L2
speakers truly have difficulties with syntactic processing, it seems important to
account for lexical influences first.
Finally, a growing number of studies on grammar acquisition and processing
have adopted general-cognitive frameworks of learning, such as associative learn-
ing theory (Ellis, 2006), the noticing hypothesis (Schmidt, 1990), and the tuning
hypothesis (Cuetos & Mitchell, 1988). Studies by Ellis et al. (2014), Sagarra and
Ellis (2013), and Godfroid and Uggen (2013) explicitly draw on the idea of the
eye gaze as a marker of overt attention to study reliance on lexical or morpho-
logical cues (Ellis et al., 2014; Sagarra & Ellis, 2013) or noticing of morphology
(Godfroid & Uggen, 2013) during processing. Dussias and Sagarra (2007) offered
a frequency-based account of bilinguals’ parsing preferences, suggesting even
the L1 parser is not impervious to environmental influences. Of these studies,
Godfroid and Uggen (2013) did not involve any ungrammatical, incongruent,
or ambiguous sentences. It will be discussed here as an example from the fourth
paradigm, the non-violation paradigm (see also Vainio et al., 2016)
Godfroid and Uggen investigated whether beginning learners of German dis-
tinguished between, or “noticed” (Schmidt, 1990), German verb stem variants
(i.e., allomorphs). In critical trials both stem variants of a verb appeared together
in two stacked sentences (see Figure 3.2). Godfroid and Uggen compared looks
to the marked verb stems (verb stems that had undergone a vowel change) with
looks to matched, unmarked verb stems that appeared in control trials. The
researchers wanted to know whether the participants, who had only been taught
how to conjugate the unmarked verb forms, would learn the verb allomorphs
from meaning-focused exposure. In this regard, they were especially interested
FIGURE 3.2 A critical trial in a grammar learning experiment experiment.The second

screen contained two stem variants of the verb sprechen, “to speak”,
namely sprech- and sprich-. Sprich- was new for participants and served as
the target for learning in the study.
(Source: Godfroid & Uggen, 2013).
TABLE 3.1 Questions in eye-tracking research on grammar acquisition and processing
1. Have L2 speakers acquired a particular aspect of the grammar and can they put their
knowledge to use in real time? (violation paradigm)
2. How do L1 and L2 speakers of different languages and language pairings parse
sentences? Do L2 speakers rely on the same structure-based principles as L1 speakers
do? (ambiguity paradigm, dependency paradigm)
3. Do parsing routines transfer between a bilingual’s two (or more) languages? (ambiguity
paradigm, dependency paradigm)
4. What is the role of individual differences in syntactic processing? (any paradigm)
5. How do L2 learners engage with unfamiliar forms they encounter in the input? Is
their online processing behavior related to their learning of these forms? (non-violation
paradigm)
in whether participants’ allocation of attention during reading predicted their

learning on a verb production post-test, as the noticing hypothesis would predict.
The researchers found increased attention to verb pairs with alternating stems,
suggestive of noticing. They also confirmed there was a positive relationship
between real-time processing of the verbs and gains on a production post-test.
By focusing on micro processes in L2 acquisition, Godfroid and Uggen, along
with colleagues working in vocabulary acquisition, illuminated how learners allo-
cate their attention when they encounter new forms in the input. Researchers
in instructed SLA, whose work will be reviewed in Section 3.2.3, additionally
consider how attentional allocation might change under the influence of differ-
ent types of instruction (Alsadoon & Heift, 2015; Cintrón-Valentín & Ellis, 2015;
Indrarathne & Kormos, 2017, 2018; Winke, 2013). Thus, the non-violation para-
digm can easily be extended to study task effects. Table 3.1 summarizes some of
the main questions that underlie eye-tracking research on grammar.
3.2.2 Vocabulary and the Bilingual Lexicon

The online search revealed a total of 17 text-based eye-tracking studies that
addressed topics related to the representation, processing, and acquisition of
vocabulary in a bilingual’s two (or more) languages (for a summary table, see
Table S3.2 in online supplementary materials). Following contemporary views
of the lexicon (e.g., Wray, 2002, 2008), vocabulary is understood to encompass
both words (e.g., spill, bean) and larger-than-word units, including idioms (e.g.,
spill the beans), figurative expressions (e.g., an early bird), partly fixed frames (e.g.,
as far as ___ is concerned), and collocations (e.g., a fatal flaw). In line with this view,
the vocabulary strand features a balanced mix of single-word processing studies
(Elgort et al., 2018; Godfroid et al., 2013; Godfroid & Spino, 2015; Mohamed,
2018; Pellicer-Sánchez, 2016) and multiword research (Carrol et al., 2016; Carrol
& Conklin, 2017; Siyanova-Chanturia et al., 2011; Yi et al., 2017).5 It further
includes research on how lexemes (in theory of any size, though in practice single
words) are represented and accessed in the bilingual lexicon (Balling, 2013; Cop,
Dirix, Van Assche, Drieghe, & Duyck, 2017; Hoversten & Traxler, 2016; Miwa
et al., 2014; Philipp & Huestegge, 2015; Van Assche et al., 2013). What connects
these three approaches is a shared interest in bilinguals’ and L2 speakers’ lexi-
cons. Beyond this thematic overlap, researchers in these different areas have pur-
sued somewhat different research questions so that it makes sense to discuss the
areas one by one. (i) Studies that look into single-word processing have used
unfamiliar or low-frequency words, sometimes even pseudowords, to ensure that
participants have little or no prior knowledge of the target words.The main ques-
tion has been whether L2 speakers can learn these words from reading—that is,
how processing (reading) and vocabulary acquisition (word learning) relate. (ii) In
research on multiword sequences, on the other hand, the idioms and colloca-
tions are typically known or familiar, although studies may also include unfamiliar,
non-idiomatic expressions in the control condition. The focus here is on how
multiword units are represented in L2 speakers’ lexicon. In other words, research-
ers with an interest in multiword units study learners’ mental representation of the
multiword sequences and use the learners’ processing data to do so. (iii) A focus on
representation also characterizes empirical research on the bilingual lexicon. In
many cases, researchers who specialize in the bilingual lexicon study the process-
ing of words that enjoy a special status in a bilingual’s two languages, for instance
cognates (animal in French and animal in English), homographs (pie in English
and pie, meaning “foot” in Spanish), and homophones (belebt in German, mean-
ing “lively”, and beleefd in Dutch, meaning “polite”). The overarching question is
whether bilinguals, whose two languages may share part of these words’ lexical
representation (as in the animal - animal example above), process the words differ-
ently than monolinguals, who by definition will know the word in one language
only, or differently than control words that do not have any cross-lingual overlap
(e.g., pantalon in French and trousers in English).
Eye-movement studies on (single-word) vocabulary acquisition are situated
within the theoretical framework of incidental vocabulary acquisition (some-
times also called contextual word learning; Elgort et al., 2018). It has long been
known that vocabulary gains may accrue incidentally, as a by-product of another
meaning-focused activity, such as reading, watching a movie, or talking with a friend
(for a review, see Schmitt, 2010). Such incidental exposure is an important source of
vocabulary growth. Eye-tracking researchers exploit the potential of the eye gaze as
a measure of overt attention to understand the finer details of how incidental word
learning might occur in real time.That is, unlike most incidental vocabulary acquisi-
tion research where the focus is on offline test performance, eye-tracking researchers
who work in this area are primarily concerned with their learners’ online process-
ing behavior; that is, how the learners engage with new words in real time. These
real-time processing data are then triangulated with participants’ performance on
offline vocabulary tests (e.g., meaning recognition or meaning recall). For instance,
Godfroid et al. (2013) studied how L2 learners allocated their attention during the
reading of short texts embedded with novel words that served as target words for
learning (see Figure 9.26 for an example). The authors linked target-word process-
ing and target-word learning, which was a novel finding and a novel application of
eye-tracking methodology at the time. More attention (longer eye fixations on) the
target words during reading resulted in better recognition of the target words on the
vocabulary post-test. This finding has since been replicated several times in reading
research (Godfroid et al., 2018; Mohamed, 2018; Pellicer-Sánchez, 2016) and has
been extended to the learning of words from watching captioned videos (Montero
et al., 2015) and the acquisition of grammar under different types of instruction
(Cintrón-Valentín & Ellis, 2015; Godfroid & Uggen, 2013; Indrarathne & Kormos,
2017). Meanwhile, the field of (single-word) vocabulary acquisition research has
moved progressively toward the use of longer reading materials, including short sto-
ries (Pellicer-Sánchez, 2016), graded readers (Mohamed, 2018), and chapters from
a novel (Godfroid et al., 2018) or a general-academic textbook (Elgort et al., 2018).
When the reading materials are longer, words will naturally occur more often and
so a new question in this line of research is how processing of target words changes
over time as readers encounter the words repeatedly in the text (Elgort et al., 2018;
Godfroid et al., 2018; Mohamed, 2018; Pellicer-Sánchez, 2016).
Eye-tracking research on multiword units is grounded in the view that language
is formulaic (e.g., Boers & Lindstromberg, 2009; Nesselhauf, 2005; Robinson &
Ellis, 2008; Wray, 2002, 2008). Corpus researchers have shown that the language
people produce is full of statistical regularities (Pawley & Syder, 1983; Sinclair,
1991) in the sense that words like to keep company with some words (e.g., strong
coffee, heavy drinker, avid reader) more than others (e.g., thick coffee, large drinker,
zealous reader). These conventionalized word sequences make up a large part of
people’s linguistic repertoires and help relieve some of the burden of word selec-
tion. However, it is unclear to what extent L2 users benefit from knowing for-
mulaic language as well because “it is only when a sequence is deeply entrenched
in a language user’s long-term memory that it qualifies as truly formulaic for that
user” (Boers & Lindstromberg, 2012, p. 85). Eye tracking can provide insight into
L2 learners’ depth of knowledge by revealing how the learners process formulaic
language in real time and specifically, whether they show a processing advantage
(faster reading times) for formulaic sequences.
Many studies in this area have focused on the processing of idioms as a pro-
totypical example of formulaic language. This line of work uses “a threshold
approach” (Yi et al., 2017, p. 4), in that an expression is either an idiom (e.g., spill
the beans) or it is not (e.g., spill the chips) and when it is an idiom, the phrase is
believed to be stored and retrieved holistically from the lexicon (e.g., Wray, 2002,
2008). More graded approaches to studying multiword units are found in col-
location studies (e.g., fatal mistake > awful mistake > extreme mistake, from Sonbul,
2015), where the association between words is strong but not absolute.Yi and col-
leagues (2017) termed the latter “a continuous approach” (p. 5) reflecting the view
that “[multiword sequences] exist as a continuum in terms of frequency and other
statistical properties” (ibid.). Interestingly, researchers who have studied a wider
range of multiword sequences (Sonbul, 2015; Yi et al., 2017) have found that L2
learners are sensitive to the statistical properties of the target language, whereas the
results from idiom-processing studies have been more mixed (Carrol & Conklin,
2017; Carrol et al., 2016; Siyanova-Chanturia et al., 2011).
Carrol and Conklin (2017) studied the processing of English and translated
Chinese idioms presented in short English sentence contexts (also see Carrol et
al., 2016, for a similar study with English and Swedish). The participants were
English monolinguals and Chinese intermediate learners of English who read
sentences like (6a)–(6d) in the first experiment. The Chinese idiom 画蛇添足,
“draw a snake and add feet”, means “to ruin something adding unnecessary detail”
(Carrol & Conklin, 2017, p. 300), but evidently this figurative meaning is only
available to people who know Chinese. Carrol and Conklin addressed the ques-
tion of whether Chinese-English bilinguals would activate their semantic or con-
ceptual knowledge of the Chinese idiom even when they were reading in English
in two eye-tracking experiments.
(6) (a) English idiom

My wife is terrible at keeping secrets. She loves any opportunity she gets
to meet up with her friends and spill the beans about anything they
can think to gossip about.
(b) English control
My wife is terrible at keeping secrets. She loves any opportunity she gets
to meet up with her friends and spill the chips about anything they can
think to gossip about.
(c) Chinese idiom
I’ve been decorating my house and I want to keep the colours simple. I
don’t want to draw a snake and add feet so I’ve chosen a nice plain
colour of paint.
(d) Chinese control
I’ve been decorating my house and I want to keep the colours simple. I
don’t want to draw a snake and add hair so I’ve chosen a nice plain
colour of paint.
The researchers analyzed reading times for the final word in the idioms and
matched non-idiomatic control phrases. The assumption behind this is that the
final word is where most facilitation would occur, because readers are most likely
to have recognized an idiom by the time they encounter the final word. Carrol
and Conklin found that the English and Chinese native speakers showed com-
plementary processing patterns. The English speakers read the English idiom (6a)
faster than the non-idiom (6b) whereas the Chinese showed a facilitation effect
on the Chinese idiom (6c) compared to the non-idiom (6d). Crucially, no dif-
ferences were found in the Chinese reading times for English idioms ([6a] versus
[6b]), which replicates Siyanova-Chanturia et al.’s (2011) earlier findings with
English speakers of mixed L1 backgrounds, but not Carrol et al.’s results (2016)
for L1 Swedes. Simply knowing the meaning of an English idiom, as the partici-
pants in all of these studies do, may therefore not be enough for participants to
enjoy the processing advantages that come from having deeply entrenched phrasal
knowledge (Boers & Lindstromberg, 2012).
In their studies, Carrol and colleagues also showed that the meanings of L1
Chinese or L1 Swedish idioms are activated during L2 reading. This aligns well
with contemporary views of how word retrieval in the bilingual lexicon works (i.e.,
in parallel for both languages), which is the third subarea of vocabulary research
to which we turn now. The dominant theoretical position about the bilingual
lexicon is that lexical access is nonselective with regard to language. This means
words from a bilingual’s two (or more) languages are jointly activated during
language processing, regardless of the words’ language membership and regardless
of the language of the task. In plain terms, bilinguals and L2 speakers can never
really “switch off ” the language they are not using (also see Section 4.2.1). Much
of our understanding of the bilingual lexicon comes from single-word-processing
studies, for instance primed and unprimed lexical decision tasks, naming studies,
and ERP research (see Kroll & Bialystok, 2013; Kroll, Dussias, Bice, & Perrotti,
2015; Kroll & Ma, 2017; Van Hell & Tanner, 2012, for reviews). Eye tracking has
been used successfully in this context to monitor Japanese-English bilinguals’ eye
movements in an English lexical decision task (Miwa et al., 2014). Eye tracking
further offers the possibility of studying words in sentences (Hoversten & Traxler,
2016; Philipp & Huestegge, 2015;Van Assche et al., 2013) or longer texts (Balling,
2013; Cop et al., 2017), where contextual information might bias attention to the
language in use (i.e., away from parallel, cross-lingual activation). Then how do
researchers measure language co-activation in seemingly monolingual contexts
such as unilingual sentences? Often, they compare the processing of words that
share a formal and/or semantic overlap between a bilingual’s two languages with
language unique words that are matched on properties such as frequency and
word length (see Sections 2.5 and 6.2.3).The former category includes cognates
(words that overlap in meaning, spelling, and pronunciation, e.g., animal in French
and English), homographs (overlap in spelling and potentially pronunciation
but not meaning, e.g., pie in English and Spanish), and homophones (overlap
in pronunciation and potentially spelling but not meaning, e.g., belebt in German
and beleefd in Dutch). Here is an example with homographs from Hoversten and
Traxler (2016), who studied bilingual lexical activation by comparing Spanish-
English bilingual and English monolingual sentence reading.
(7) (a) Congruent

While eating dessert, the diner crushed his pie accidentally with his elbow.
(b) Incongruent
While carrying bricks, the mason crushed his pie accidentally with the
load.
Pie is an interlingual homograph. The meaning of the word differs between

English (“a kind of dessert”) and Spanish (“foot”), as does the pronunciation.
The English meaning “a kind of dessert” is congruent with the semantic context
of sentence (7a) only; conversely, the Spanish meaning “foot” fits in the con-
text of sentence (7b). Hoversten and Traxler used these sentences (and sentences
with language unique words, not shown here) to compare the predictions of two
contrasting hypotheses with regard to bilingual lexical access—nonselective and
selective access. Interestingly enough, the researchers found that Spanish-English
bilinguals read the sentences similarly to English monolinguals until late in the
process. Both monolinguals and bilinguals reread earlier parts of the sentence
when they encountered an incongruent interlingual homograph in (7b). Only
in the bilinguals’ case did this resolve the lexical ambiguity. (This is not sur-
prising considering the English monolinguals were unfamiliar with the Spanish
meanings of the homographs.) Importantly, there was no evidence of early inter-
ference in the bilingual group, contra the prevailing nonselective access hypoth-
esis.6 Hoversten and Traxler took this to mean that language co-activation can
change, or be modulated, under some circumstances. They argued bilinguals can
shift between a nonselective (e.g., Spanish-English) and a selective (e.g., English)
language mode depending on contextual factors, as well individual differences in
language proficiency, executive control, and a word’s frequency in both languages
(these last three factors were investigated in other studies). The nonselectivity of
lexical access, therefore, may not generalize to all contexts and tasks. A goal for
future research will be to determine how and in what stage of word retrieval these
intervening variables have an influence. Table 3.2 summarizes some of the main
questions that have guided eye-tracking research on vocabulary acquisition, the
representation and processing of multiword sequences, and the bilingual lexicon.
TABLE 3.2 Questions in eye-tracking research on vocabulary and the bilingual lexicon
1. What is the role of attention in incidental vocabulary acquisition?

Can L2 learners’ word learning gains be traced to the learners’ online processing
behavior?
2. What are the roles of repetition and context in incidental vocabulary acquisition?
How does the processing of target words change over time, when L2 learners
encounter the same words repeatedly in context?
3. How deeply entrenched is L2 speakers’ knowledge of idioms, collocations, and other
multiword sequences?
Do L2 speakers enjoy the same processing advantages as L1 speakers that come from
having deep knowledge of what words tend to co-occur in a language?
4. Do bilinguals activate the meanings of idioms automatically and across different
languages?
5. How does knowing words in more than one language affect word representation and
processing in the bilingual lexicon? What factors influence whether bilinguals activate
word candidates in more than one language simultaneously?
3.2.3 Instructed Second Language Acquisition

Many adult L2 learners learn language in a classroom setting. Unlike children,
adults often make a conscious choice to learn a new language and the actions
they take, such as enrolling in a language course, reflect this. Questions about how
to tailor instruction for optimum L2 learning, therefore, are a central concern
for language instructors and L2 researchers alike. They lie at the heart of research
in instructed second language acquisition, or ISLA (Leow, 2015; Loewen, 2015).
The field of instructed SLA revolves around three key notions: instruction, sec-
ond language, and acquisition (for a comprehensive review, see Loewen, 2015).
Instruction refers to different approaches to input presentations; for instance, dif-
ferent types of instruction or presentation of materials. Second language refers
to languages besides one’s first language. Finally, acquisition can be interpreted
as development of L2 knowledge or proficiency. To understand how these three
key terms intersect, in particular how and what input presentations, or external
modifications, enhance the process of L2 learning, researchers have investigated
the role of instruction types (e.g., explicit or implicit instruction) and materials
(e.g., salience of the input, frequency of the input, complexity of the task) as vari-
ables that are highly relevant for instructional purposes.
In recent years, a number of researchers have discovered the value of eye-move-
ment recordings to inform ISLA-related questions (for a review, see Godfroid,
2019). The idea is to use the eye gaze as a measure of attention in order to com-
pare attentional allocation under different instructional conditions (Alsadoon &
Heift, 2015; Choi, 2017; Cintrón-Valentín & Ellis, 2015; Indrarathne & Kormos,
2017, 2018; Révész, Sachs, & Hama, 2014; Winke, 2013) or to different types
of written feedback (Shintani & Ellis, 2013): see online supplementary materi-
als for summary tables, Table S3.3. Much work in this area has dealt with input
enhancement—the bolding, underlining, coloring, or highlighting through some
other visual means of target forms in the input—perhaps because input enhance-
ment is a quintessentially visual manipulation and therefore lends itself well to
being studied with eye tracking. Eye-tracking research on input enhancement
has generally confirmed that input enhancement increases visual attention, in that
participants pay more attention (i.e., look longer and/or more often at) visually
enhanced than unenhanced forms (Alsadoon & Heift, 2015; Choi, 2017; Cintrón-
Valentín & Ellis, 2015; Winke, 2013; but see Indrarathne & Kormos, 2017, 2018).
Participants who received visually enhanced input typically also outperformed
control participants on grammar or vocabulary knowledge post-tests (Alsadoon &
Heift, 2015; Choi, 2017; Cintrón-Valentín & Ellis, 2015; Indrarathne & Kormos,
2017, 2018; but see Winke, 2013) and in these cases, learning gains could be
traced to increased attention to the (enhanced) target forms during the learn-
ing or exposure phase (Alsadoon & Heift, 2015; Cintrón-Valentín & Ellis, 2015;
Indrarathne & Kormos, 2017). These findings, then, suggest that participants’ eye
movement behavior during the instructional intervention can offer a window
into how much the participants are learning, at least as measured on immediate,
discrete post-tests of grammar or vocabulary knowledge.
Parallel to the ongoing research on visual input enhancement (e.g., Issa &
Morgan-Short, 2019; Issa, Morgan-Short, Villegas, & Raney, 2015) researchers
are broadening the range of instructional conditions included in eye-movement
studies (Cintrón-Valentín & Ellis, 2015; Indrarathne & Kormos, 2017, 2018). By
expanding the domain of inquiry in this way, these researchers are able to con-
tribute important empirical data addressing the role of explicit instruction in L2
acquisition, which is known as the interface debate (also see Andringa & Curcic,
2015). As a case in point, Cintrón-Valentín and Ellis (2015, Experiments 1 and
2) compared three types of focus-on-form instruction to help L1 English speak-
ers overcome their attentional biases when learning a new, inflection-rich lan-
guage, namely Latin (also see Ellis et al., 2014; Sagarra & Ellis, 2013). The types of
instruction were verb pretraining (VP), verb grammar instruction (VG), and verb
salience with textual enhancement (VS). There was also a control group. The VP
and VG groups completed a pretraining phase, in which the VP group trained on
inflected verbs (as sole cues) and the VG group engaged in a brief grammar lesson
on verb tense morphology. After that, learners from all four groups marked the
temporal reference (past, present, future) of simple Latin sentences that consisted
of a temporal adverb and a verb form marked for tense (i.e., two cues to temporal
reference, of which the verb cue is the one that tends to be blocked, or ignored,
by L1 English speakers). Verb endings were bolded and printed in red for the VS
group, in an attempt to help them overcome their attentional biases toward the
adverb (see Figure 3.3). The results of the eye-tracking data for phase 2, which
were collected for a subset of participants, showed that all three treatment groups
FIGURE 3.3 An example trial from the exposure phase in a study on learned attention.
Both the adverb heri, “yesterday”, and the verb cogitavi, “I thought”,
denote past time.
(Source: Cintrón-Valentín & Ellis, 2016).
paid more, sustained attention to the verb cues than the control group, who grad-
ually lost interest in the verbs as training progressed. Moreover, the proportion
of time participants fixated on either cue (i.e., the verb or the adverb) during
training correlated with the participants’ cue reliance during sentence interpreta-
tion and production. Although the three focus-on-form techniques were similarly
effective in refocusing learners’ attention in this study, Cintrón-Valentín and Ellis
noted that the optimum levels of explicitness and explanation in ISLA will vary
for different types of constructions.
We have seen how the use of eye tracking in ISLA has spread from a direct,
visually oriented intervention (input enhancement) to other instructional tech-
niques. A further development in this line of research is the use of eye tracking
to validate features of instructional design. Within the field of task-based lan-
guage teaching, Révész (2014) advocated for the use of eye tracking and other
methodologies to validate task complexity manipulations, given the importance
of task characteristics, and specifically task complexity, in theoretical accounts of
task-based language teaching and learning (Robinson, 2001, 2011; Skehan, 1998,
2009). In a study on the acquisition of the English past counterfactual conditional,
Révész and her colleagues (2014) recorded participants’ eye movements as they
completed two oral production tasks (one simple, one complex). The tasks were
designed to differ in their reasoning demands; that is, how straightforward it was to
identify the likely cause, out of two, for a stated outcome (see Figure 3.4). Analyses
FIGURE 3.4 Sample trial in an ISLA study. Participants were tasked with describing
the causal relationship between two events based on a story they had
just read. The picture prompt on the right was hypothesized to be more
complex because both answer options were plausible.
(Source: Révész et al., 2014).
TABLE 3.3 Questions in eye-tracking research on ISLA
1. How do pedagogical techniques (e.g., input enhancement, rule-search instructions,

metalinguistic information, skewed input) affect the allocation of attention during
processing?
Can L2 learners’ lexical or grammatical development be traced to their online
processing behavior?
2. Do pedagogical techniques differ in their effectiveness?
3. Do tasks with different complexity levels impose different amounts of cognitive load
on learners?
4. Do L2 learners process written feedback? Does their depth of processing differ
depending on the type of feedback they receive?
of eye-movement data revealed that L1 and L2 English participants made more

and longer fixations in the picture area for the if clause if both response alterna-
tives were plausible causes for the outcome (stated in the then clause) than if only
one option was (see Figure 3.4). These results for the eye-tracking data coincided
with expert judgments and accuracy data on a secondary task, obtained from
different groups of participants. Having successfully validated their task complex-
ity manipulation, the authors then proceeded to study the role of recasts, input
frequency distributions, and task complexity in a new study with a new group of
participants. Table 3.3 summarizes some of the main questions that have guided
eye-tracking research on ISLA.
3.2.4 Subtitles
Humans are surrounded by rich, multimodal input. They commonly experience
visual (pictorial) and verbal information at the same time, for instance when drink-
ing coffee with a friend in a coffee shop (sound and image), viewing advertise-
ments (text and image), or watching an opera performance with the soundtrack
displayed above the stage (sound, text, and image). Not only do humans have the
ability to decode information obtained from multiple input streams at the same
time, they can also integrate these sources into coherent, multimodal representa-
tions of the world. Eye-tracking researchers in SLA have recently turned to a
specific type of bimodal input, namely foreign films with subtitles (written L1
translations of the aural input) or captions (written L2 renderings of the aural
input). Subtitled and captioned videos are prime examples of multimodal input
because they combine visuals, spoken verbal, and written verbal input, all in one
viewing experience.
SLA researchers use eye tracking to examine how bimodal input conditions
influence reading behavior (Muñoz, 2017; Winke et al., 2013) or L2 learn-
ing (Bisson et al., 2014; Montero Perez et al., 2015): see online supplementary
materials for summary tables, Table S3.4. Their work builds on 35 years of subti-
tles research that has shown, among other things, that captions are beneficial for
listening comprehension and L2 vocabulary learning (see Montero Perez,Van den
Noortgate, & Desmet, 2013, for a meta-analysis). Although some of these earlier
studies also used eye tracking (e.g., d’Ydewalle & Gielen, 1992; d’Ydewalle &
De Bruycker, 2007; d’Ydewalle, Praet, Verfaillie, & Van Rensbergen, 1991), most
previous subtitles research relied on offline performance measures such as mul-
tiple-choice questions or free recall. The value of eye tracking, therefore, lies in
illuminating, in fine temporal detail, how viewers allocate, shift, or divide their
attention between the image and text, when the image and text provide partially
overlapping information and processing is further constrained by the soundtrack.
The theoretical advantages of bimodal or multimodal input are fairly well
established—multimodal input reduces cognitive burden, which leads to bet-
ter processing and intake (Gass, 1997). Multimodal input also increases recall as
learners make use of both aural and visual working memory, thus expanding on
their limited capacity for storing information in either memory system (Baddeley,
1986). In SLA, the focus has been on whether the advantages of multimodal input
hold across different linguistic domains, starting with vocabulary (Bisson et al.,
2014; Montero Perez et al., 2015), and whether subtitles or captions are equally
effective for different learner profiles (Muñoz, 2017; Winke et al., 2013).
Of the four identified subtitles studies (see Table 3.4), two studies (Bisson et al.,
2014; Montero Perez et al. 2015) have linked bimodal input with vocabulary
acquisition. Montero Perez and her colleagues (2015) examined the role of cap-
tion types (full captions or keywords) on vocabulary acquisition in L2 French
learners who either were or were not informed they would be tested on vocabu-
lary afterward. The researchers used keyword captioning, as an alternative to full
captioning, in an attempt to increase the words’ visual salience and enhance learn-
ing. The amount of visual attention to the target words (i.e., viewers’ eye fixation
durations) showed an interesting relationship with word learning (test scores),
TABLE 3.4 Questions in eye-tracking research on multimodal input
1. How does caption-reading behavior relate to L2 listening comprehension, grammar,

and vocabulary acquisition?
2. Are captions and subtitles equally effective for different types of L2 learners and
different L1-L2 pairings?
3. How can we make captions or subtitles better pedagogical tools for L2 acquisition?
Suggested avenues for future research
4. How does subtitles or caption reading differ from the reading of static text?
5. How does the soundtrack guide the eyes during subtitles or caption reading; in other
words, what is the ear-eye relationship?
6. How do viewers divide their attention between the text and image area when
watching subtitled or captioned videos?
which differed for the keyword and full captioning groups and for the groups
that did, versus did not, receive a test announcement. Keyword captioning elic-
ited longer fixations on the target words than full captions if participants knew
about the upcoming test. Keyword captioning also led to higher form recognition
scores. However, a direct relationship between fixation duration and word recog-
nition could not be established for the keyword groups, only for the full-caption
group that received a test announcement. The relationship between attention and
learning, therefore, held in only one of the four conditions, although the results
generally confirmed the benefits of isolating key linguistic information in cap-
tions for better learning outcomes.
Researchers have also looked into how age and proficiency (Muñoz, 2017)
and content familiarity and target language (Winke et al., 2013) relate to caption-
reading behavior. Processing text paired with aural input and a changing visual
background is a complex task.This is particularly true for young learners, who are
still developing cognitively, and for beginning L2 learners (including young learn-
ers), whose language skills may hamper adequate comprehension (Vanderplank,
2016). In response to these concerns, Muñoz (2017) examined L2 learners’ read-
ing behaviors of two captioned Simpsons clips as a function of the participants’
age (children, adolescents, and adults) and L2 proficiency (beginner, intermedi-
ate, advanced). Because all the children were beginning learners of English, most
adolescents were at an intermediate level, and most adults were advanced, the
results for the two sets of analyses largely coincided. Muñoz found that beginners/
children experienced more processing difficulty reading L2 English captions than
L1 Spanish subtitles and that advanced speakers/adults tended to skip L1 subtitles
more often. The findings are interesting in light of a global trend toward starting
foreign language instruction at a younger age, often in primary school. As Muñoz
explained, captioned or subtitled video can be a valuable pedagogical tool for
child L2 learning, provided the children are able to read the captions or subti-
tles. Future researchers could disentangle proficiency and age-related factors more
clearly. Analyses could also focus on how the eye moves relative to what the ear
hears and how the line of sight travels between the text and the video image (see
Bisson et al., 2014, for analyses of the image area). Such analyses would capture the
multimodal experience of watching subtitled or captioned videos more fully and
advance our understanding of what makes reading captions or subtitles different
from reading static text. Table 3.4 summarizes some current and potential future
questions in eye-tracking research on multimodal input.
3.2.5 Assessment
Language assessment researchers have turned to eye-movement recordings as a
tool to provide insights into how test takers interact with test items. Research
to date has evaluated L2 reading assessment (Bax, 2013; McCray & Brunfaut,
2018), L2 listening assessment (Suvorov, 2015), and speaking assessment for L1
and L2 English-speaking children (Lee & Winke, 2018): see online supplementary
materials for summary tables, Table S3.5. Although the research questions in each
study differ, in part because of the focus on different language skills, assessment
researchers who employ eye tracking are generally interested in test validity.
The overarching question, therefore, is whether language tests assess what they are
intended to measure, which is language proficiency.
A way to investigate test validity is by examining online behaviors of differ-
ent groups of test takers (e.g., successful vs. unsuccessful, high vs. low proficiency,
native vs. non-native) responding to the same test items. Such comparisons can
reveal whether a test discriminates between test takers with different linguistic
profiles, which is usually considered a positive sign of the validity of a test. The
exception is the native–non-native speaker comparison in Lee and Winke’s (2018)
study, where the goal was to investigate whether child English language learners
(the non-native speaking population) felt psychologically safe taking the English
language test and therefore, native–non-native speaker differences in eye move-
ments were not considered to be a good thing. Other questions addressed in the
assessment strand of eye-tracking research include the role of visual information
in L2 listening assessment (Suvorov, 2015) and the distribution of global and local
reading processes in banked gap-fill items (McCray & Brunfaut, 2018). To better
understand the construct of banked gap-fill items, McCray and Brunfaut (2018)
selected 24 test items from the Pearson Test of English Academic (see Figure 3.5
for a publicly available example, not used in the study). The researchers wanted
to know to what extent test takers’ use of global (higher-level, text-based) read-
ing processing vs. their local (word-level) reading processes related to their test
FIGURE 3.5 A gap-fill task in assessment research.

(Source: McCray & Brunfaut, 2018).
performance, considering that this type of reading test is designed to measure the
whole spectrum of reading processes.
Based upon Khalifa and Weir’s (2009) cognitive processing model of reading,
the authors proposed seven hypotheses of how different aspects of lower- and
higher-level processing relate to test performance.They examined eye-movement
behavior in three categories of task performance: (i) overall processing of the
gap-fill items, (ii) text processing (e.g., reading the text), and (iii) task process-
ing (e.g., engaging with the word bank). The authors found higher-performance
test takers completed the test faster than the lower-performance group, whereas
lower-level performers visited the word banks more often. Spending more time
on the words surrounding the gap (a local reading strategy) was associated with
lower test scores. These findings show that less successful test takers evidenced
more lower-level text processing. It follows that successful engagement with gap-
fill tasks may require higher level reading skills, in line with the stated objectives
for banked gap-fill test items.
Another way to ensure test validity is by exploring the role of test irrelevant
features and their impact on test performance. For a language proficiency test to be
valid, test scores should not reflect test irrelevant features, or construct-irrelevant
variance, because this would muddle the construct being measured. Test validity
is of special concern for cognitively developing populations such as young test
takers, since these groups of test takers are unfamiliar with highly restricted test
environments (e.g., performing under time pressure) and are thus more likely to be
influenced by test conditions. A recent study conducted by Lee and Winke (2018)
attempted to address these concerns by exploring response behaviors of young test
takers of the TOEFL® PrimaryTM speaking test (sample task shown in Figure 3.6).
Lee and Winke recruited native and non-native test takers aged eight, nine, or ten
years old to find out if developmental differences or issues in task design might
account for performance accuracy and response patterns, as seen in the children’s
eye-movement data and spoken output.
In an effort to measure their young learners’ test experience comprehen-
sively, the authors triangulated multiple data sources in their analyses, namely
drawings, interview data, test performance scores, and eye-movement measures.
Lee and Winke found the English language learners scored lower on two dif-
ficult items than their native peers. Eye gaze behavior during speech aligned
with speaking performance on these items. In particular, the English language
learners had a stronger tendency to look at the timer on the screen (see Figure
3.6) and this was associated with poor speech production (e.g., hesitations or
silence). Although the causality of this finding is unknown (i.e., did a lower
proficiency cause fixations on the timer or did the timer interfere with less
proficient speakers’ test performance), the findings illustrate the importance of
understanding test takers, test conditions, and their characteristics. Table 3.5
summarizes some of the main questions in contemporary eye-tracking research
on language assessment.
FIGURE 3.6 Sample picture description task.
(Source: The TOEFL® PrimaryTM speaking test for young test takers used in Lee and Winke, 2018.
Copyright © 2013 Educational Testing Service. Used with permission).
TABLE 3.5 Questions in eye-tracking research on language assessment
1. What cognitive construct does a certain test or test item measure? Do test items
successfully discriminate test takers’ language abilities?
2. How do test takers derive the right answer on a test? Does test takers’ response
behavior solely reflect their language abilities?
3. How does video function in a multimedia listening test environment? How do audio-
only listening tests differ from video-based listening tests? Is it a good idea (from a test
validity perspective) to include visual support in listening assessment?
4. Are the features of a test (e.g., directions, prompts, time pressure) appropriate for more
vulnerable test populations (e.g., young children)?
5. How do raters interact with rubrics and rubric categories when rating speech samples
or essays? How do raters interact with essays or other writing samples they are
evaluating?
3.3 Conclusion
This chapter has given readers a tour of text-based eye-tracking research in SLA
and bilingualism. The itinerary was determined by a synthetic review of the eye-
tracking literature which encompassed 16 discipline-specific journals. That syn-
thetic review revealed a total of 82 eye-tracking publications published between
2003 and 2017. Fifty-two studies involved some type of written input, making
text-based eye tracking the largest category of eye-tracking research.The other 32
studies, which collectively make up visual world research and production studies,
will be reviewed in Chapter 4.
The growing body of eye-tracking research confirms the trend toward
increased use of real-time methodologies in SLA and bilingualism (see Chapter
1). Researchers across different paradigms are increasingly recognizing the value of
measuring language knowledge and processing as it occurs. At the same time, L2/
bilingual eye-tracking research still has a lot of potential for growth as about half
of the surveyed journals had published no eye-tracking or one eye-tracking study
at the time the search was concluded. There is reason to believe that this situation
will likely change soon because the use of eye-tracking methodology in the field
is rapidly diversifying. Indeed, for the present synthetic review, I identified a total
of five major eye-tracking research strands, four strands of which emerged after
2010. The five strands are grammar (the oldest and largest strand), vocabulary and
the bilingual lexicon, instructed second language acquisition, captions and sub-
titles processing, and assessment. The bulk of work in each strand—and in some
cases all of it—has appeared since 2010. Given the goals of this methodology
book, my review focused on general questions and strand-specific approaches that
may inspire readers’ own research projects. I gave an overview of the breadth and
depth of contemporary L2 eye-tracking research. One thing that became clear
is that, despite some similarities between research strands, eye tracking cannot
offer a one-size-fits-all solution. As L2 and bilingualism researchers’ needs and
research interests differ, so will the applications of eye-tracking methodology.This
alludes to the importance, mentioned in the beginning of this chapter, of reading
widely in eye-tracking and non-eye-tracking literatures and staying informed of
cross-disciplinary trends in eye-tracking methodology. Together with one’s own
expertise, this confluence of ideas may spark creativity in the research process and
result in more robust and innovative studies.
Notes
1 Studies that were available online first in 2017 had a 2018 publication date.
2 When describing eye-tracking studies, I will often use the terms trial, item, and sentence.
A trial is a sequence of events that represents one basic unit in an experiment. For
instance, a trial can be one sentence followed by a comprehension question, a sentence
paired with four pictures, a prime word followed by a target word, one screen in a
video, or a question on a listening test. An item is the central element inside a trial. It is
typically what researchers will focus on most in their analyses. In sentence-processing

research, an item subsumes all the different versions of a sentence (e.g., grammatical and
ungrammatical; enhanced and unenhanced; frequent and low frequent). See Section
6.2.3, for more information.
3 The anomaly paradigm is a widely used paradigm that cuts across different real-time
methodologies. We find similarly designed studies that include sentences with gram-
matical violations in reaction time, self-paced reading or listening, and ERP research
(see Section 1.1).
4 Lexical decoding is the speed and consistency with which people can retrieve the
meanings of words from their mental lexicon. Participants who are fast, consistent, and
accurate in retrieving word meanings have a high command of their vocabulary.This is
known as fluency or automaticity of word knowledge (Segalowitz, 2010).
5 Another study by Choi (2017) looked at the acquisition of collocations under visually
enhanced and unenhanced input conditions. Because of the author’s focus on input
enhancement, I categorized this study along with other input enhancement research
under instructed SLA, in this case instructed SLA with a focus on vocabulary (see
Section 3.2.4).
6 The non-selective access hypothesis predicts that both meanings of a homograph are
activated simultaneously (Hoversten & Traxler, 2016); because this causes interference,
the hypothesis predicts that bilinguals would initially slow down.
4
WHAT TOPICS CAN BE STUDIED
USING THE VISUAL WORLD
PARADIGM? A SYNTHETIC REVIEW
This chapter and its companion chapter in Chapter 3 present the results of a
synthetic review of eye-tracking studies in SLA and bilingualism. The present
chapter focuses on eye-tracking studies conducted with spoken language, using
the visual world paradigm, a paradigm for studying spoken language comprehen-
sion. Although visual world eye tracking and eye tracking with text differ in many
ways, researchers working in the two paradigms still pursue similar questions
regarding the processing, acquisition, and representation of language. The present
chapter shares with Chapter 3 a goal to survey the field of eye-tracking research in
all its breadth and diversity. By surveying what types of questions have been suc-
cessfully addressed using eye-tracking methodology, this overview can serve as a
springboard for readers to formulate their own research questions and to kickstart
their own research projects with eye tracking.
4.1 Foundations of the Visual World Paradigm

In eye-tracking studies with text, researchers assume an eye-mind link that
relates the eye gaze (an index of overt attention) to ongoing processing in the
mind (see Section 1.2 and 2.6). To exploit this link, the majority of reading
researchers include one or more durational measures in their eye-tracking stud-
ies (see Section 7.2.1.2). When matched with appropriate control conditions, the
idea is that longer eye fixation durations reflect either more processing (as a result
of an ungrammatical or unexpected, lexical or grammatical form, for instance), or
higher task demands. Because of this, eye movements during the processing of text
have been termed a processing load measure (Tanenhaus & Trueswell, 2006).
Eye fixation durations can tap into participants’ processing difficulty, but at the
same time, inferences about time course may be less straightforward (compared
Research Topics in the Visual World Paradigm 89
to visual world studies) because the spatial layout of print text does not neces-
sarily correspond to the order in which readers process the words (Tanenhaus &
Trueswell, 2006).1 Thus, the primary strength of processing load measures is that
they signal “transient changes in process complexity” (Tanenhaus & Trueswell,
2006, p. 874) which researchers can use to “make inferences about the underlying
processes and representations” (ibid.).
The visual world paradigm, on the other hand, presents researchers with a
set of strengths that complement eye tracking with text. A method for studying
spoken language processing, the visual world paradigm is founded on a link-
ing hypothesis that maps auditory-linguistic processing onto visual processing
(see following). Visual world researchers use eye movements as a representa-
tional measure (Tanenhaus & Trueswell, 2006). This means that eye fixations
in the visual world can reveal what linguistic representations become activated
in listeners’ mind at any given time. For example, eye fixations can reveal how
and when the phonology and meaning of a word are retrieved from the mental
lexicon during listening (e.g., Allopenna, Magnuson, & Tanenhaus, 1998; Dahan,
Magnuson, & Tanenhaus, 2001; Dahan, Swingley, Tanenhaus, & Magnuson, 2000;
Marian & Spivey, 2003a, 2003b; Spivey & Marian, 1999). Eye fixations also show
how personal and reflexive pronouns are interpreted in relation to their possible
linguistic antecedents, which are depicted on the screen (e.g., Cunnings, Fotiadou,
& Tsimpli, 2017; Kim, Montrul, & Yoon, 2015; Runner, Sussman, & Tanenhaus,
2003, 2006). Because eye movements are time-locked to the auditory signal (there
is no skipping or regressing during listening), visual world data provide fine-
grained information about the time course of processing; however, they do not,
in any straightforward manner, address questions of processing load or processing
difficulty (Tanenhaus & Trueswell, 2006).
Eye tracking with text and visual world eye tracking thus appear to provide
complementary perspectives on language processing and representation. Given
the complementary nature of the two paradigms, eye tracking turns out to be a
useful research tool because eye movements are not only a processing load meas-
ure (as in text-based eye-tracking studies), but also a representational measure (as
in visual world studies), although such a distinction “is more of a heuristic than a
categorical” one (Tanenhaus & Trueswell, 2006, p. 874). This means that eye fixa-
tions, regardless of paradigm, are able to reveal what is represented and activated in
the mind because processing and representations are inextricably linked.
In this chapter, I will present the findings from part two of the synthetic review:
the visual world paradigm. Before this, however, it is crucial to understand the theo-
retical foundations of this body of work. Previously, I mentioned that visual world
research is founded on a linking hypothesis—a formal account of how auditory-
linguistic and visual processing come together and manifest themselves in a partici-
pant’s eye gaze. It is to different versions of this hypothesis that we now turn.
Early work using eye tracking and audiovisual materials uncovered a link
between audio materials, eye-fixation behavior, and visual information, even
90 Research Topics in the Visual World Paradigm
though the author, Roger Cooper, did not theorize about their relationship.
Cooper (1974) recorded eye movements of participants who listened to a story
while simultaneously looking at a 3 × 3 picture display (see Figure 4.1). He found
that participants looked at the pictures named in the story, suggestive of an audio-
eye-image link. He also found the link extended to semantically related items
(e.g., the word Africa elicited looks to pictures of a lion and a zebra). Although the
idea that language could direct attention in similar ways to pointing was thought
to be “unsurprising” (Altmann, 2011b, p. 979), Cooper’s findings revealed two
fundamental facts upon which contemporary visual world research is built: first,
when humans see an object, they activate the concept in memory and register the
object’s spatial location in the visual scene; and second, with the spatial location
FIGURE 4.1 Display used in Cooper (1974) while participants listened to a story.

(Source: Reprinted from Cooper, R. M., 1974. The control of eye fixation by the meaning of
spoken language: A new methodology for the real-time investigation of speech perception, memory,
and language processing. Cognitive Psychology, 6(1), 84-107, with permission from Elsevier. © 1974
Cognitive Psychology).
registered, people look at what they hear (the linguistic input), as well as whatever
is phonologically or semantically related to what they hear.
Although very innovative, Cooper’s methodology did not attract much atten-
tion until researchers in the mid to late 1990s started applying it to the process-
ing of phonology (Allopenna et al., 1998; Eberhard, Spivey-Knowlton, Sedivy,
& Tanenhaus, 1995), semantics (Altmann & Kamide, 1999), syntax (Eberhard et
al., 1995; Tanenhaus, Spivey-Knowlton, Eberhard, & Sedivy, 1995), and pragmatic
information (Eberhard et al., 1995). Michael Tanenhaus and his colleagues (1995)
initiated a new era of visual world research with a paper on the role of visual
context in syntactic processing, which was published in Science. Participants’ eye
movements were tracked as they performed simple tasks with real objects. The
researchers showed that visual context had an immediate effect on listeners’ sen-
tence interpretation. Specifically, whether participants saw one or two potential
referents (e.g., one apple or two apples) influenced how they initially interpreted
Put the apple on the towel … in the box: as a goal (put the apple on the towel,
when there was only one apple) or a noun modifier (the apple that is on the
towel, when there were two apples). As Tanenhaus and Trueswell (2006) noted,
the presence of concrete objects—a “visual world”—alongside spoken language
makes the paradigm particularly well suited for studying questions of referential
processing (i.e., how people relate language to external referents). Indeed, the
original Tanenhaus et al. (1995) study has now been extended to child L1 speak-
ers (Trueswell, Sekerina, Hill, & Logrip, 1999) and adult L2 speakers (Pozzan &
Trueswell, 2016) as well (see Section 4.2.3).
As researchers embarked on their new research programs on spoken language
processing, the deeper question of how and why eye movements might reflect
linguistic processing became key. The first to propose a simple linking hypothesis
were Paul Allopenna and his colleagues. Allopenna et al. (1998) used a visual world
paradigm to investigate spoken word recognition. Participants saw a visual display
(see Figure 4.2) and listened to instructions such as “Pick up the beaker; now put
it below the diamond” (p. 419). Here the target was beaker. The researchers found
that participants looked at images of phonological onset competitors (i.e., beetle)
and rhyme competitors (i.e., speaker) more than they looked at an unrelated control
object (i.e., carriage). The findings thus showed that words with similar names com-
peted with the target for word recognition, as seen in the listeners’ eye fixation data.
The use of competitors (e.g., phonologically, visually, or semantically similar
words) in visual displays has become a key technique for studying different kinds
of activation and competition effects in the visual world paradigm (see Textbox
6.1). Work with bilinguals has been particularly interesting in this regard (e.g.,
Blumenfeld & Marian, 2007; Marian & Spivey 2003a, 2003b; Mercier, Pivneva,
& Titone, 2014, 2016; Spivey & Marian, 1999), as researchers have shown that
competition effects are not language specific, but occur within and across a
person’s two or more languages. In the following, we will see several more exam-
ples of lexical competition effects with bilinguals (see Section 4.2.1).
FIGURE 4.2 Display used in Allopenna et al. (1998).

(Source: Reprinted from Allopenna, P. D., Magnuson, J. S., & Tanenhaus, M. K., 1998. Tracking the
time course of spoken word recognition using eye movements: Evidence for continuous mapping
models. Journal of Memory and Language, 38(4), 419–439, with permission from Elsevier. © 1998 Journal
of Memory and Language).
A second, important finding of Allopenna et al.’s study was that the empirical
eye-movement data closely followed theoretical predictions of a spoken word
recognition model, TRACE (McClelland & Elman, 1986). Using computer sim-
ulations, Allopenna and his colleagues generated predicted activation levels for
the different word candidates (e.g., beaker, beetle, speaker, carriage) as the spoken
input unfolded over time. The model predictions showed a striking similarity to
the eye fixation data obtained from the participants (see Figure 4.3). This sup-
ported “a clear linking hypothesis between lexical activation and eye movements”
(Allopenna et al., 1998, p. 438), whereby the likelihood of a participant fixating
on a word corresponds to the word’s lexical activation level as predicted by the
model (also see Tanenhaus, Magnuson, Dahan, & Chambers, 2000). At the same
time, the authors already recognized that their linking hypothesis was a simple one
FIGURE 4.3 Linking eye movement data and activation levels of lexical representations
as predicted by the TRACE model.The empirical data follow theoretical
predictions very closely and lend support to a simple linking hypothesis.
(Source: Reprinted from Tanenhaus, M. J. & Trueswell, J. C., 2006. Eye movements and spoken
language comprehension. In M. J. Traxler & M. A. Gernsbacher (Eds.), Handbook of Psycholinguistics
(2nd edition) (pp. 863–900). London: Academic Press, with permission from Elsevier. © 2006 Elsevier).
and a more comprehensive model of visual world processing might be needed

down the road.
The nature of language-mediated eye movements was the topic of a dec-
ade-long collaboration between Gerry Altmann and Yuki Kamide. In a landmark
study, Altmann and Kamide (1999) showed that participants also look at objects
that have yet to be named in the linguistic input, which revealed that the listeners
were anticipating the upcoming input. Altmann and Kamide used a visual dis-
play of a boy, a cake, a ball, a toy car, and a toy train, and other similarly composed
displays (see Figure 4.4). They found that native speakers were faster to launch
their eyes toward the cake (the only edible object in the display), when hearing The
boy will eat … than The boy will move … . In two additional experiments, Kamide,
Altmann, and Haywood (2003) extended these findings to three-place verbs (e.g.,
The woman will spread the butter on the bread versus The woman will slide the butter to
the man) and to sentence contexts where the subject and verb jointly determine
the most likely object (e.g., the man will ride the motorbike versus the girl will ride the
carousel).2 These results evidenced the anticipation or prediction of upcoming
information in different syntactic structures (e.g., two- versus three-place verbs)
and across different-size linguistic units (e.g., verb versus subject + verb).
The ability of the visual paradigm to capture anticipatory or predictive process-
ing, first uncovered by Altmann and Kamide, has been important to the further
development of the paradigm. Prediction requires certain structural representa-
tions to be in place; that is, to predict, one must know what is likely to occur in
a given context based on previous experience with the language and the world.
FIGURE 4.4 Display used in Altmann and Kamide (1999).

(Source: Reprinted from Altmann, G. T. M. & Kamide, Y., 1999. Incremental interpretation at verbs:
Restricting the domain of subsequent reference. Cognition, 73(3), 247–264, with permission from
Elsevier. © 1999 Cognition).
A key strength of the visual world paradigm, therefore, is that it can reveal L1
and L2 speakers’ linguistic knowledge representations during real-time spoken-
language processing, in the form of participants’ anticipatory eye movements (also
see Section 4.2.2.1).
Meanwhile, the new evidence for anticipatory eye movements suggested a
need to revise the linking hypothesis. Research had shown that eye movements in
the visual world could be either referential—coinciding with what was named
(e.g, Allopenna et al., 1998; Tanenhaus et al., 1995)—or anticipatory—ahead of
what would be named (e.g., Altmann & Kamide, 1999; Kamide et al., 2003). In
the latter case, an account in terms of lexical activation was clearly insufficient
because if an object hasn’t been named yet, its spoken form cannot activate the
word representation. A third kind of eye movements, which proved to be impor-
tant theoretically, was the occurrence of eye movements to objects in the display
that were neither named nor going to be named in the linguistic input. This was
the focus of a new study by Altmann and Kamide (2007), which involved a tense
manipulation (also see Section 9.3.1, research idea #6).
In Altmann and Kamide (2007), native English speakers listened to sentences
such as The man will drink the beer and The man has drunk the wine, while they
viewed a display with a full glass of beer and an empty wine glass (see Figure 4.5).
FIGURE 4.5 Display used in Altmann and Kamide (2007).

(Source: Reprinted from Altmann, G. T. M. & Kamide, Y., 2007. The real-time mediation of visual
attention by language and world knowledge: Linking anticipatory (and other) eye movements to
linguistic processing. Journal of Memory and Language, 57, 502–518, with permission from Elsevier. ©
2007 Journal of Memory and Language).
As the authors noted, visual displays are static but “events, like sentences, unfold
in time; they have a beginning and an end state” (p. 504). Thus, interpreting a
dynamically unfolding event such as The man has drunk the wine in the presence of
a static visual scene requires listeners to establish a temporal relationship between
the scene and the event described. Altmann and Kamide found that listeners did
exactly that: they looked to the empty wine glass more often in the past-tense
condition has drunk (in which case they interpreted the scene as showing the end
state of the drinking event) whereas they tended to look at the full beer glass more
often in the future-tense condition will drink (suggesting they now interpreted the
scene as showing the initial state of the event).3 These changing event represen-
tations highlight the role of the visual scene and specifically, the idea that one and
the same visual representation can receive different interpretations depending on
the accompanying linguistic input.
Together with other studies (Dahan & Tanenhaus, 2005; Huettig & Altmann,
2004, 2005, 2011), the findings from Altmann and Kamide’s tense study pro-
vided the foundation for a revised version of the linking hypothesis. According
to their hypothesis, the interpretation of the linguistic input takes place against
the background of object affordances and real-world knowledge. Affordances

of an object are essentially “what kinds of events the object can participate in”
(Altmann & Kamide, 2007, p. 510). For example, a full glass affords drinking, but
an empty glass does not; however, an empty glass could potentially hold a liquid
(i.e., this is one of its affordances) and, furthermore, real-world knowledge can
inform listeners of how full glasses may come to be empty (thus leading to the
interpretation of an empty glass as the end state of a drinking event). The infer-
ence chain linking language processing to eye movements in the visual world is
then as follows (Altmann & Kamide, 2007; Altmann & Mirković, 2009):
1. If an object’s affordances fulfill the conceptual requirements imposed by the

language (e.g., through verb tense morphology), the mental representation of
the object will increase in activation because it is now primed through two
channels, linguistic and nonlinguistic.
2. This increase in activation either constitutes or causes a shift in covert attention.
3. It may then engender an eye movement to the location of the object on
screen, so listeners realign their overt and covert attention.
Thus, linking happens in the listener’s mental world—eye movements to objects

in the display are simply an expression of what is conceptually salient (i.e., highly
activated) in a listener’s internal event representation at any given moment in
time, due to the overlap in featural representations between the language and the
visual context.
To summarize, the linking hypothesis has undergone different stages of
development (e.g., Allopenna et al., 1998; Altmann & Kamide, 2007; Dahan &
Tanenhaus, 2005; Kamide et al., 2003), reflecting the empirical data that were
available at a given time. The contemporary version of the linking hypothesis
highlights the role of object affordances, event representations, and shared featural
representations—which could be visual, phonological, semantic, (morpho-)syn-
tactic, or otherwise—as key components of language-mediated eye movements
(Altmann & Kamide, 2007; Altmann & Mirković, 2009). Changing event repre-
sentations have not been a focus in L2 eye tracking yet; however, many L2 and
bilingualism researchers are interested in predictive processing of various kinds
(see Section 4.2.2). Therefore, an account of language-mediated eye movements
in terms of lexical activation is unlikely to suffice because, by definition, predic-
tion entails there is no mention of the target referent’s name in the language
yet. A more elaborate linking hypothesis (Altmann & Kamide, 2007; Altmann &
Mirković, 2009) may therefore be necessary for SLA and bilingualism as well, even
if the consequences of this hypothesis (e.g., in terms of updating event representa-
tions) may not have been explored to their full potential yet. As it currently stands,
it is the role of object affordances and featural overlap, more so than the dynamic
interpretation of the visual scene, that seem key to understanding many of the
phenomena of interest in SLA and bilingualism.
4.2 Research Strands within Visual World Eye Tracking

To gauge the state of the art in eye-tracking research with L2 speakers and bilin-
guals, I reviewed all eye-tracking studies that had been published in print or online,
as an advance online publication, in well-known SLA and bilingualism journals
by June 2017.4 Here, I summarize the key parameters of the literature search (for
a detailed description, see Section 3.2).With the help of a research assistant, I per-
formed a search in 15 journals from the fields of SLA and bilingualism (VanPatten
& Williams, 2002) and one L2 assessment journal. The complete list of journals is
available in Section 3.2 and is reproduced in Figure 4.6.We searched for a combi-
nation of keywords in four academic databases, namely Linguistics and Language
Behavior Abstracts (LLBA), PsycInfo, Education Resources Information Center
(ERIC), and Google Scholar. The keywords used were a combination of (1) eye
tracking, eye-tracking, eye movement, eye-movement, eye gaze, or eye fixation, paired
with (2) second language, foreign language, L2, adult, bilingual, second, language, or
learners. This search yielded a total of 85 eye-tracking studies that had appeared in
one of the 16 targeted journals. Before coding the substantive and methodological
features of each study, I first assigned each article to the broad categories of either
text-based eye tracking (see Chapter 3) or visual world eye tracking (this chapter).
For the present review, a visual world study was defined as any eye-tracking
study that combined bimodal input (audio + visuals) with a simple visual display.
The prototypical visual display consisted of only a few elements on screen or in a
physical workspace—either images, objects, or (more rarely) printed word forms.
Following Huettig, Rommers, and Meyers (2011), I also included eye-tracking
FIGURE 4.6
Distribution of eye-tracking studies across 16 SLA and bilingualism
journals. Note: SSLA = Studies in Second Language Acquisition; VWP =
visual world paradigm.
studies on language production as a part of the current review (i.e., Flecken, 2011;
Flecken, Carroll, Weimar, & Von Stutterheim, 2015; Kaushanskaya & Marian,
2007; Lee & Winke, 2018; McDonough, Crowther, Kielstra, & Trofimovich, 2015;
McDonough, Trofimovich, Dao, & Dion, 2017). Because of its focus on speaking
assessment, Lee and Winke (2018) was also discussed together with other assess-
ment research, in Section 3.2.5, and was therefore not included in the total tally
of 32 visual world studies.
The previously mentioned characteristics (bimodal input, simple display)
defined prototypical visual world research. A small number of studies did not fall
neatly into this category and thus required closer scrutiny. For one, I subsumed
subtitles and captions research under the broad umbrella of text-based research (see
Section 3.2.4) because the visual displays of captioned materials are much more
complex than what is common in visual world research. Furthermore, although
captioned or subtitled videos are multimodal, analyses to date have focused pri-
marily on how the captions or subtitles are read (i.e., text processing). Second,
Bolger and Zapata (2011), whose study was mentioned in Section 3.2, combined
elements of text-based eye tracking and visual world eye tracking in different parts
of their vocabulary instruction study. Even though the study was not bimodal, I
will review it as a part of visual world research (i) because of the similarities in
visual display (a few large elements on the screen) and (ii) because of the con-
ceptual focus on interference or competition effects (a recurring theme in visual
world studies). The outcome of the categorization process is shown in Figure 4.6,
which represents the distribution of text-based and visual world research across 16
SLA and bilingualism journals.
The literature search revealed there have been just over half as many visual
world studies (k = 32) as text-based studies (k = 52) in SLA to date. Dissemination
of eye-tracking research has been concentrated in a fairly small number of jour-
nals; this was true for text-based eye tracking and holds true even more for visual-
world research. Bilingualism: Language and Cognition accounts for nearly half of
the visual world studies published up to 2017. The studies published in this jour-
nal show a large thematic overlap with studies published in psychology journals,
including the Journal of Memory and Language, Cognition, and Language, Cognition,
and Neuroscience, with several authors actively engaging with the two research
communities.5 Visual world research has also been reported in Second Language
Research, Studies in Second Language Acquisition, Applied Psycholinguistics, and
Language Learning. Some more recent developments, such as the use of the visual
world paradigm to study effects of instruction (Andringa & Curcic, 2015; Bolger
& Zapata, 2011; Hopp, 2016), implicit and explicit knowledge (Suzuki, 2017;
Suzuki & DeKeyser, 2017), and the proficiency correlates of prediction (Hopp,
2013, 2016) have appeared in these journals. Finally, production studies, by their
varied nature, have appeared in a range of different journals, including The Modern
Language Journal, reflecting their more theoretical or applied research objectives.
My prediction is that the newer applications of the visual world paradigm, as well
as production studies, will play an important role in introducing the visual world
paradigm to SLA more broadly, because of the new research avenues these appli-
cations open up for the field.
Within this sample of 32 visual world studies, I identified four broad research
strands with the help of a research assistant, who coded all the studies. These
strands are (1) word recognition (see Section 4.2.1), (2) prediction (see Section
4.2.2), (3) referential processing (see Section 4.2.3), and (4) production (see
Section 4.2.4). With half of all studies, the prediction strand is by far the larg-
est area of L2 and bilingual visual world research, accounting for 16 out of
the 32 studies. To capture this diversity and adequately represent the multiple
levels at which listeners can predict, I further divided prediction research by
level of the linguistic hierarchy: semantic prediction, morphosyntactic predic-
tion, and discourse-level prediction. Therefore, readers interested in grammar
research are invited to consult the section on morphosyntactic prediction (see
Ssection 4.2.2.3), as well as Sections 4.2.2.4 and 4.2.3. Instructed second lan-
guage acquisition researchers can find relevant research summarized under the
effects of instruction on prediction (see Section 4.2.2.5). Finally, vocabulary
researchers and researchers studying the bilingual lexicon may find the section
on word recognition particularly interesting (see Section 4.2.1). In what follows,
I will provide an overview of the types of questions investigated in each strand,
starting with the “lowest” level of the linguistic hierarchy, word recognition or
lexical processing, and gradually working my way toward the higher levels of
linguistic representation.
4.2.1 Word Recognition
The online search revealed a total of six visual world eye-tracking studies that
examined topics in word recognition (see online supplementary materials for
summary tables, Table S4.1). The majority of these studies deal with the nature
(i.e., structure of and access to) the bilingual lexicon (Marian & Spivey, 2003a,
2003b; Mercier et al., 2014, 2016).The overarching question about the bilingual
lexicon is whether words are organized and accessed in separate lexicons, as if the
bilingual had two mental dictionaries, one for each language, or whether words
are accessed all together, regardless of language, and form one mental dictionary
or integrated lexicon (also see Section 3.2.2). Empirical data to date favor the
view of an integrated lexicon. The visual world paradigm has also been used
to investigate word segmentation in L2 French connected speech (Tremblay,
2011), where a misalignment of syllable boundaries and word boundaries in liai-
son (e.g., fameux élan, “famous swing”) could theoretically make word recogni-
tion more difficult for L2 French learners. Lastly, recognition of words has been
a tool for studying individual differences in bilinguals (e.g., inhibitory control)
in a Stroop task (Singh & Mishra, 2012) and in prototypical visual world studies
with four-image displays (Mercier et al., 2014, 2016).
Visual world researchers who study word-level phenomena will often account
for their data in terms of lexical activation and competition effects. The idea
is that before the meaning of a word is retrieved, the incoming input activates
multiple word candidates (e.g., can, candle, candid, candy) which compete for rec-
ognition in the listener’s lexicon. We already saw an example of lexical competi-
tion effects in Allopenna et al.’s (1998) study (see Section 4.1). In this study (see
Figure 4.2), as the target word beaker unfolded, native English speakers looked at
the image of a beetle (onset competitor) and a speaker (rhyme competitor) more
than the image of a carriage (unrelated distractor). Hence, their eye movements
revealed transient activation and competition effects during listening. Generalizing
from this study, visual world eye tracking can uncover subtle competition effects
in L1 and L2 spoken word recognition, revealed in participants’ looks to nontar-
get, competitor images on screen that are displayed alongside the target (also see
Textbox 6.1, for a summary of the different roles of images in visual world studies).
By definition, L2 speakers and bilinguals know words in more than one lan-
guage. This makes them an interesting population to study lexical competition
effects. Will words from all languages compete for recognition even if the input
is solely in one language? This is a major question in visual world research on the
bilingual lexicon (also see Section 3.2.2). Like text-based eye-tracking research-
ers, visual world eye-tracking researchers have adduced important evidence that
bilingual speakers’ lexicons are, indeed, integrated and accessed non-selectively
(see Kroll & Bialystok, 2013; Kroll, Dussias, Bice, & Perrotti, 2015; Kroll & Ma,
2017;Van Hell & Tanner, 2012, for general reviews).
In two influential experiments, Marian and Spivey (2003a, 2003b) asked
Russian-English bilinguals to manipulate real objects that were laid out on a
workspace. Unbeknownst to the participants, the names of some objects over-
lapped phonologically in the language used in the experiment (e.g., English) or
between the bilinguals’ two languages (i.e., English and Russian). The display in
Figure 4.7 mirrors the real objects participants saw in the actual experiments.
Each display contained a target object (e.g., shovel) that participants were asked to
pick up. In the critical conditions, the display also contained a within-language
competitor (e.g., shark) and/or a between-language competitor (e.g., a balloon,
pronounced as /ʃαrik/ in Russian). When the experiment was in English, the
researchers found that the balloon, like the shark, exerted an influence on partici-
pants’ eye movements. This suggested both languages were activated at the same
time during listening.
Marian and Spivey’s studies (also Spivey & Marian, 1999) marked the begin-
ning of visual world eye tracking in bilingualism. Since then, researchers have
attempted to uncover the factors (e.g., age of L2 acquisition, L2 proficiency, lan-
guage of the experiment, phonological overlap) that influence the strength of
between-language competition, given that these effects are sometimes weak or
even absent. In two studies, Mercier and her colleagues (2014, 2016) focused on
the role of inhibitory control as a potential factor (also see Blumenfeld & Marian,
FIGURE 4.7 Display used in a word recognition experiment.

(Source: Modeled after Marian and Spivey, 2003a, 2003b, with images from the International Picture
Naming Project: Bates et al., 2003; Szekely et al., 2003).
2011). They found that inhibitory control—the ability to suppress irrelevant

information—can modulate parallel language activation in bilinguals at both local
(word) and global (task) levels.
Mercier et al. (2014) used a similar study design as Marian and Spivey (2003a,
2003b), in that their displays included both within- and between-language com-
petitors. The researchers compared looks to competitors in English-French bilin-
guals with different levels of cognitive control. They found that those with higher
inhibitory control recognized spoken words more efficiently (reduced competi-
tion effects). Interestingly, this effect was most apparent for less proficient bilin-
guals who did the task under challenging conditions. Mercier et al. (2016) further
examined the issue of inhibition at the global (task) level. Bilingual English-French
participants first spoke in either French or English and then performed a visual
world experiment in English. The English visual world experiment entailed a lan-
guage switch for the group that previously spoke in French. The researchers were
primarily interested in how this language switch influenced the participants’ inhi-
bition of competitors. Interestingly, the group that had spoken in French showed
reduced effects of French competitors in the English listening task, suggesting they
had globally inhibited the task-irrelevant language (Mercier et al., 2016). It appears,
then, that language users can proactively inhibit all the words in a language, not
unlike someone who speaks French in the workplace but a different language at
home would “switch off ” all French when she gets home (cf. Mercier et al., 2016).
Looking at inhibitory control as a consequence, rather than a cause, Singh and
Mishra (2012) examined levels of inhibitory control in two groups of Hindi-English
bilingual speakers.The researchers used an oculomotor (eye-movement-based) ver-
sion of the Stroop task, a classic measure of inhibitory control (MacLeod, 1991;
Stroop, 1935), shown in Figure 4.8. They found that bilinguals with higher L2
English proficiency outperformed lower-proficiency bilinguals on the task. When
replicating this study, future researchers could include additional control variables
such as participants’ socioeconomic status, educational experience (schooling sys-
tem), language use during their leisure time, and nonverbal intelligence, to bolster
the case for the reported cognitive advantages of proficient bilinguals (for reviews,
see Bialystok, 2015; Paap, 2018;Valian, 2015).
In sum, these studies illustrate how eye tracking can capture moment-to-
moment processing at sublexical levels. As bottom-up linguistic input becomes
available, activation spreads to the phonemic and lexical representations, where
FIGURE 4.8 Display used in the oculomotor Stroop task. Participants needed to make an
eye movement to the color patch that matched the ink color of the color
word (e.g., red), while ignoring the word’s meaning (e.g., hara means “green”).
(Source: Figure supplied by Dr. Ramesh Kumar Mishra, University of Hyderabad, India; Singh &
Mishra, 2012).
TABLE 4.1
Questions in visual world eye tracking on lexical processing and word
recognition
1. How does knowing words in more than one language affect word representation and
processing in the bilingual lexicon?
1.1. To what extent does competition between languages depend on the bilingual’s
linguistic profile (e.g., language dominance, daily L1 and L2 use, L1 and L2
vocabulary size, L2 proficiency, age of L2 acquisition)?
1.2. To what extent does competition between languages depend on the bilingual’s
cognitive profile (e.g., inhibitory control, nonverbal intelligence)?
1.3. To what extent do task factors (e.g., language switch, language mode) and item-
level variables (e.g., degree of phonological overlap between words) influence
competition between languages in the bilingual lexicon?
2. To what extent does proficiency in multiple languages influence participants’
inhibitory control in verbal and nonverbal tasks?
3. How do L2 learners at different proficiency levels parse connected speech that
contains an ambiguous word boundary?
lexical candidates compete for recognition. Importantly, these activation and com-
petition effects are reflected in participants’ eye movements to different images
on the screen. Hence, the fine-grained temporal information in eye-movement
records can reveal the subtleties of word recognition. Table 4.1 summarizes some
of the main questions that have guided this line of work.
4.2.2 Prediction
4.2.2.1 What is Prediction?
The online search revealed a total of 16 out of 32 visual world studies that were
categorized as prediction research (see online supplementary materials for sum-
mary tables, Tables S4.1–S4.5). Prediction in language processing refers to the
“pre-activation/retrieval of linguistic input before it is encountered by the lan-
guage comprehender” (Huettig, 2015, p. 122, my emphasis). More generally, pre-
diction is the influence of the preceding context on the current state of the
language processing system (Kuperberg & Jaeger, 2016). In the visual world
paradigm, anticipatory eye movements (see Section 4.1) provide particularly
strong evidence for prediction: the behavioral response (the anticipatory look to
the target) happens before the predictable word appears in the input.The recording
of electrical brain activity in EEG/ERP research (see Section 1.1.4) can simi-
larly provide pure tests of anticipation (e.g., DeLong, Urbach, & Kutas, 2005;
Kuperberg & Jaeger, 2016; Wicha, Moreno, & Kutas, 2004). Many other methods,
however, such as reading and lexical decision, provide only indirect evidence for
prediction because the occurrence of prediction needs to be inferred from data
that are obtained during the processing of the predictable word.
Consider an idiom such as spill the beans, “to divulge a secret”, in which the
final word is fairly predictable (cf. Carrol & Conklin, 2017). Compelling evidence
for prediction in the visual world paradigm would come from looks to an image
of beans before the onset of the word “beans”. Likewise, reading researchers may
find that beans is processed faster than chips in spill the ____, because spill the chips
is not an idiom and, hence, the final word is less predictable (for reviews of pre-
dictability and idiom processing, see Section 2.5 and Section 3.2.2, respectively).
While the latter finding is still informative, the point is that faster reading times
are the consequences of prediction rather than prediction per se (Huettig, 2015).
The visual world paradigm, in contrast, can capture these true predictive effects
as they are happening. This relatively unique ability, shared only with EEG/ERP
research (see Section 1.1.4), makes visual world eye tracking particularly well
suited to study prediction (for recent reviews, see Huettig, 2015; Kuperberg &
Jaeger, 2016).
The role of prediction in contemporary cognitive theory is difficult to over-
state. Clark (2013), in a highly influential review, argued that action-oriented
prediction may provide a “unified theory of the mind” (p. 200), in which percep-
tion, action, and attention are all linked in a single theoretical account. Prediction
is everywhere. In daily life, drivers can often predict the traffic light patterns they
encounter on their daily commute. Music lovers can generally tell when a song
is about to end. If you are a dancer like me, you will not only predict the song’s
ending but also try to align your dance moves with the predicted ending.
In SLA, prediction can help explain the concept of “noticing the gap”
(Gass, 1997; Schmidt & Frota, 1986), a mechanism for language learning. Second-
language learners are said to notice a gap when they consciously register a mis-
match between what their interlocutor says and how they themselves would have
said it (see Godfroid, 2010, for discussion). In other words, for such noticing to
occur, a listener must have predicted what she would say next in the current
sentence context before the speaker actually says it, so the listener can compare
the spoken form to the form she predicted would come next. When the listener
notices a gap, and she deems her interlocutor to be more proficient, this will trig-
ger an adjustment of her internal prediction mechanisms, which amounts to a
form of L2 learning (e.g., Altmann & Mirković, 2009; Huettig, 2015). Importantly,
on this view, the listener is also a speaker, in the sense that she actively draws on
production processes to predict the upcoming input during comprehension (cf.
Pickering & Garrod, 2013). In Pickering and Garrod’s (2013) words, “producing
and understanding are tightly interwoven, and this interweaving underlies peo-
ple’s ability to predict themselves and each other” (p. 329).
Although prediction is fundamental to how humans act in the world (Clark,
2013; Friston, 2010), the mechanisms that underlie prediction are not yet fully
understood. Huettig (2015) proposed that at least four different mechanisms
can underlie prediction, which he jointly referred to as PACS—Production-,
Association-, Combinatorial-, and Simulation-based prediction. Of
these, association-based prediction may be of special interest to L2/bilingualism

researchers. It is argued that association-based prediction reflects the outcomes of
simple associative learning mechanisms (Altmann & Mirković, 2009; Bar, 2007,
2009; Huettig, 2015). For example, if language users make gender-based predic-
tions based on their associative knowledge (see Section 4.2.2.3),6 they will draw
on their knowledge of article-noun and adjective-noun associations (agreement),
which they have accumulated over years of exposure to the language.Thus, from a
second language acquisition perspective, the association-based route to prediction
could be quite revealing of L2 speakers’ knowledge, because it may reflect their
implicit-statistical knowledge or competence in the language.
The importance of association-based prediction notwithstanding, the take-
home message from Huettig’s review article is that prediction is not a unitary
process (Huettig, 2015). There are at least four kinds of prediction mechanisms
that may underlie the same predictive behavior. What this means is that antici-
patory eye movements could be traced back to multiple cognitive mechanisms.
Therefore, care is needed when interpreting findings of prediction (e.g., as show-
ing linguistic competence or implicit-statistical knowledge) because there may
be more than one possible account for the same behavioral data. To advance
the field’s understanding of the diverse set of mechanisms at work in predic-
tion, Huettig (2015) called for more individual differences research (for a recent
example with L1 speakers, see Hintz, Meyer, & Huettig, 2017). In the following
sections, I will highlight the L2 and bilingualism studies that have adopted an
individual differences approach.
Given the centrality of prediction in human behavior, it is natural to ask to
what extent L2 speakers and bilinguals, like L1 speakers and monolinguals, predict
during language processing. One possibility, advanced by Huettig (2015), is that
prediction heavily depends on people’s proficiency in the task at hand. In this
view, L2 speakers, especially less proficient L2 speakers, may show a reduction
in predictive behavior. This idea has become known in SLA and bilingualism as
the Reduced Ability to Generate Expectations (RAGE) hypothesis (Grüter,
Rohde, & Schafer, 2014, 2017). Grüter and her colleagues proposed that L2
speakers’ cognitive resources may be fully depleted by here-and-now processing
(e.g., word retrieval and integration), leaving “little or no resources (…) for taking
up non-essential cues to update expectations” (Grüter et al., 2014, p. 189). At the
same time, several research teams have now reported evidence of semantic predic-
tion (Dijkgraaf, Hartsuiker, & Duyck, 2017; Ito, Corley, & Pickering, 2018) and
morphosyntactic prediction (Dussias,Valdés Kroff, Guzzardo Tamargo, & Gerfen,
2013; Hopp, 2013; Hopp & Lemmerth, 2018; Trenkic, Mirković, & Altmann,
2014) in L2 speakers and bilinguals, so a strong version of the RAGE hypothesis
is unlikely to be correct. Kaan (2014) argued that the prediction mechanisms in
L1 and L2 speakers are “underlyingly the same” (p. 257) but differences in perfor-
mance occur due to knowledge and processing differences. Kaan listed differences
in frequency biases, the quality of lexical representations, competing information
from a person’s two+ languages, and task-induced processes and strategies (e.g.,
priming effects) as possible sources of divergence in L1/L2 performance. A more
productive approach to studying prediction, therefore, may be to step away from
a strict L1-L2 dichotomy and focus on individual differences instead (for a recent
example with bilinguals, see Peters, Grüter, & Borovsky, 2018). By including cog-
nitive (e.g., working memory) and linguistic factors (e.g., receptive and productive
vocabulary size, overall proficiency), visual world researchers can come to under-
stand the extent to which listeners with different cognitive and linguistic profiles
generate expectations during real-time processing.
Likewise, testing bilinguals in both of their languages is a good way to evaluate
their overall linguistic abilities, including their prediction skills, in a non-deficit
approach (for examples, see Dijkgraaf et al., 2017, and Sekerina & Sauermann,
2015). If bilinguals are able to generate predictions in their L1 or dominant lan-
guage, this makes the important point that they are capable of making predictions
in principle. At the same time, the within-subject research, which involves testing
participants in all of their languages, will let researchers narrow down the factors
that contribute to a potential absence of predictive behavior in the L2 or non-
dominant language (also see Section 5.2).
TABLE 4.2 Questions in visual world eye tracking on prediction
1. To what extent does prediction occur across L1 and L2 processing as a domain-

general cognitive mechanism? Do L2 learners show predictive processing similar to
native speakers?
1.1. To what extent do learner characteristics (e.g., L1 background, L2 proficiency,
recency of use, aptitude) account for individual differences in L2 predictive
processing?
1.2. To what extent do linguistic factors (i.e., L1-L2 similarity at the lexical and
syntactic level) account for individual differences in L2 predictive processing?
2. To what extent does prediction rely on the listener’s cognitive resources? What are the
cognitive mechanisms that underlie L2 prediction?
3. To what extent could prediction be used as a measure of linguistic knowledge? What
type(s) of linguistic knowledge underlie prediction?
4. To what extent can bilinguals generate predictions based on multiple linguistic and
non-linguistic cues?
5. To what extent could explicit instruction (e.g., grammar rule explanation) enhance
implicit learning?
6. What is the relationship between knowledge used in language production and in
comprehension?
7. How does the presentation format of new vocabulary (e.g., grouping in semantic or
thematic sets) influence the retrieval of word meaning?
8. Can L1 and L2 speakers infer the meaning of new words from spoken story contexts?
To what extent does richness of contextual information mediate the learning of
new word meanings? How are novel lexical items processed in relation to contextual
information that is concurrently available?
In what follows, I will provide an overview of the prediction literature in L2

and bilingualism eye-tracking research, with a special emphasis on the types of
questions researchers have addressed.
4.2.2.2 Semantic Prediction
Work on L2 semantic prediction is only now beginning to appear, with two recent
publications leading the way (see online supplementary materials, Table S4.2).
Dijkgraaf et al. (2017) and Ito et al. (2018) have used semantic prediction as a
tool to study more general questions about predictive language processing, as
described in the previous section. In this work, the fact that prediction is based
on semantic cues appears secondary to the larger theoretical goal of uncovering
whether and to what extent L2 speakers engage in prediction. Both Dijkgraaf and
colleagues (2017) and Ito and colleagues (2018) reported evidence that L2 speak-
ers can and do generate predictions during L2 processing, consistent with Kaan’s
(2014) theoretical account (see Section 4.2.2.1). Ito and her colleagues further
examined the cognitive mechanisms that underlie prediction in native and non-
native speakers. Together, these studies contribute important empirical data to
understand prediction and its mediating factors in L2 listening.
Research on L2 semantic prediction extends a long line of semantic predic-
tion research with L1 speakers that originated in psychology with Altmann and
Kamide’s (1999) study (see Section 4.1). Similarly to Altmann and Kamide’s study,
L2 participants listen to simple S-V-O sentences (e.g., Mary reads/steals a letter,
The lady will fold/find the scarf), in which the second noun is either predicted or
not by the semantic information in the verb. Researchers want to know whether
L2 listeners, like L1 listeners, can utilize these semantic restrictions in real time to
anticipate the upcoming noun (i.e., letter or scarf, shown as images on the screen).
Results from Dijkgraaf et al. (2017) and Ito et al. (2018) converged in showing
that unbalanced bilinguals predicted thematic roles to the same extent across their
two languages and/or similarly as monolingual L1 speakers.
Ito et al. (2018) further sought to establish a causal relationship between work-
ing memory resources and predictive processing. The authors showed that antici-
patory eye movements in L1 and L2 listening were similarly delayed when the
participants were concurrently performing a memory task (remembering a list
of words). The authors concluded that “predictive eye movements draw on some
of the cognitive resources that are used for remembering words” (p. 260). Hence,
making predictions—in either L1 or L2—is a process that demands cognitive
resources and may be most likely to occur when such resources are available.
Table 4.2 summarizes the main questions that have guided L2 semantic prediction
research with eye tracking.
4.2.2.3 Morphosyntactic Prediction
The previous section has shown that L2 speakers are capable of making lin-
guistic predictions in principle (cf. Kaan, 2014). While the meanings of words
may be shared across languages, which may facilitate L2 prediction, many
linguistic phenomena are specific to a given language. The question then
becomes what will happen if cues are not instantiated in a participant’s L1
or are instantiated differently in the L2 than the L1, and therefore cannot
be transferred. This question has attracted considerable interest from L2 and
bilingualism researchers, who have often chosen morphosyntactic (grammar)
phenomena to examine it.
Work on morphosyntactic prediction is the largest substrand of L2 prediction
research and, indeed, the entire visual world paradigm. A total of ten studies have
investigated L2 morphosyntactic prediction in its classic form (see online supple-
mentary materials, Table S4.3). Two additional studies have combined morpho-
syntactic prediction research with an instruction component; these studies will be
reviewed in what follows (see Section 4.2.2.5). Much research was conducted using
gender-based prediction as a test case (Dussias et al., 2013; Grüter, Lew-Williams,
& Fernald, 2012; Hopp, 2013; Hopp & Lemmerth, 2018; Morales, Paolieri, Dussias,
Valdés Kroff, Gerfen, & Teresa Bajo, 2016). Grammatical gender lends itself well
to studying prediction because it creates agreement relationships between nouns,
articles, and adjectives.7 Gender marking on the article (e.g., el zapato, “theMASC.
shoe”) or adjective (e.g., ein grosser Wecker, “a bigMASC. alarm clock”) provides a cue
to the upcoming noun. If listeners have the grammatical knowledge and cognitive
resources to use this cue, they could anticipate the noun; that is, make a gender-
based prediction. Through different L1-L2 pairings, researchers can further investi-
gate what happens when grammatical gender is absent from speakers’ L1, as is the
case in English, or represented very similarly, as in Spanish and Italian (Dussias et al.,
2013; Morales et al., 2016). They can also examine whether L1 gender is activated
during L2 gender processing (Hopp & Lemmerth, 2018; Morales et al., 2016).
Research on gender-based prediction came to SLA through a series of three
studies conducted in Anne Fernald’s lab. Lew-Williams and Fernald (2007) showed
that Spanish native speakers, even at a young age, are able to use the grammati-
cal information in Spanish articles to predict upcoming nouns. Lew-Williams and
Fernald (2010) extended their study to adult classroom-based learners of Spanish.
They found no evidence of prediction with familiar nouns in their learner data.
Grüter et al. (2012) similarly found limited evidence of L2 prediction with familiar
article-noun combinations, this time in a group of highly advanced to near-native
L2 Spanish speakers. In all three studies, participants heard instructions containing
nouns preceded by gender-marked articles (e.g., ¿Dónde está la pelota? “Where is
theFEM. ball?”). Of special interest were the trials where the display depicted two
objects of different genders (e.g., la pelota “theFEM. ball” and el zapato “theMASC. shoe”):
see Figure 4.9, for an example. These trials, which are referred to as different-
gender trials, allow for prediction, because the gender-marked article uniquely
FIGURE 4.9
Display used in gender prediction experiments. Because el zapato,
“theMASC. shoe”, and la pelota, “theFEM. ball”, differ in grammatical gender,
the article el or la acts as a predictive cue for the following noun.
(Source: Figures supplied by Dr. Casey Lew-Williams, Princeton University).
identifies the following noun. If listeners are able to use grammatical gender as a cue,
they will look at the target image faster in different- compared to same-gender trials.
A similar approach—comparing different- with same-gender trials—underlies
other gender prediction studies as well (Dussias et al., 2013; Hopp, 2013, 2016;
Hopp & Lemmerth, 2018; Morales et al., 2016). Overall, results for gender-based
prediction have been mixed, with researchers generally reporting evidence of
prediction in highly proficient speakers (Dussias et al., 2013; Hopp, 2013; Hopp
& Lemmerth, 2018; but see Grüter et al., 2012) but not in less proficient speakers
(Dussias et al., 2013; Hopp, 2013; Lew-Williams & Fernald, 2010). Interestingly,
prediction does occur more consistently if learners are trained on the article–
noun combinations first, as if they were learning new vocabulary (Grüter et al.,
2012; Hopp, 2016; Lew-Williams & Fernald, 2010). Together, these results high-
light the role of L2 proficiency level in prediction, which itself may be related to
the amount and the type of input learners have received and the environment in
which learning takes place.
Because gender-based prediction provides such a neat paradigm, researchers
have adopted it to study the role of moderating variables. I already mentioned the
role of L2 proficiency (see previous paragraph). Other factors include the relation
between production and comprehension (Grüter et al., 2012; Hopp, 2013, 2016)
and L1 background (Dussias et al., 2013; Hopp & Lemmerth, 2018; Morales et
al., 2016). Regarding the comprehension–production relationship, Grüter et al.
(2012) triangulated data from three measures: offline comprehension (sentence
picture matching), online production (elicited imitation), and online compre-
hension (visual world eye tracking). They showed that highly advanced speakers’
occasional errors in production were mirrored in online comprehension in the
form of weaker prediction effects. This suggested it is the real-time retrieval of
gender information—whether productively or receptively—that is difficult. Also
focusing on the production–comprehension relationship, Hopp (2013) asked L1
and L2 German speakers to prename the images (along with their determiner
or gender-marked adjective) that appeared in the visual world displays of the
subsequent experiment. Hopp found that only those L2 speakers who consist-
ently assigned the correct grammatical gender to the images engaged in predic-
tive processing in the visual world experiment (also see Hopp, 2016). Given that
most production errors in Grüter et al.’s study were also gender assignment errors
(e.g., *el pelota instead of la pelota, “theFEM. ball”), these two studies underscore the
importance of robust lexical knowledge for gender-based prediction.
Lastly, researchers have also investigated L1 transfer effects in L2 gender pre-
diction (Dussias et al., 2013; Hopp & Lemmerth, 2018; Morales et al., 2016).
Research by Morales et al. (2016) and Hopp and Lemmerth (2018) suggests a
role for gender congruency in anticipatory processing. For instance, Morales et al.
(2016), in a Spanish language experiment, found that Italian learners of Spanish
looked at the target object more when it had the same gender in participants’ L1
Italian as L2 Spanish, for example ilMASC. formaggio and elMASC. queso, “the cheese”
(also see Section 9.3.1, research idea #4). Hopp and Lemmerth (2018) reported
nativelike prediction for high-intermediate Russian learners of German, but only
when gender was marked syntactically in the same way in the two languages (i.e.,
on adjectives, not articles). Finally, the role of typological distance has yet to be
examined more systematically; however, results from Dussias et al. (2013) for L1
Italian–low proficiency L2 Spanish speakers suggest typological similarity could
aid in prediction.
Taken together, the different substrands of gender prediction research converge
in showing that gender-based predictive processing relies on the robust encoding
of grammatical gender on individual lexical items. Highly proficient learners and
learners who master gender assignment tend to demonstrate “nativelike” prediction.
For other learners, whose lexical representations are perhaps less stable, performance
is more subject to L1 influence. Depending on the L1-L2 relationship, these learn-
ers will either be helped or hindered in L2 predictive processing by their native
language.
In recent years, work on morphosyntactic prediction has expanded to other
target structures and languages. There is now also prediction research on case
marking (Mitsugi, 2017; Mitsugi & MacWhinney, 2016; Suzuki, 2017; Suzuki &
DeKeyser, 2017), classifiers (Suzuki, 2017; Suzuki & DeKeyser, 2017), and definite
and indefinite articles (Trenkic et al., 2014). This list is likely to keep growing in
the following years, as researchers identify new grammatical phenomena that lend
themselves to making predictions.
An interesting case in point is prediction in L2 Japanese, a verb-final lan-
guage (Mitsugi, 2017; Mitsugi & MacWhinney, 2016; Suzuki, 2017; Suzuki &
DeKeyser, 2017).The verb-final status of Japanese allows for studying prediction
based on cues other than the verb, unlike in English and other head-initial lan-
guages, where the verb assumes a central role in enabling predictions (for exam-
ples, see Section 4.2.2). For instance, in a replication of Kamide et al. (2003)
with L2 learners, Mitsugi and MacWhinney (2016) used Japanese sentences that
translate as (1) and (2):
(1) Ditransitive construction—canonical word order

schoolLOC. serious studentNOM. strict teacherDAT. quietly examACC. handed over
“At the school, the serious student quietly handed over the exam to the
strict teacher.”
(2) Accusative (monotransitive) construction
schoolLOC. serious studentNOM. strict teacherACC. quietly teased
“At the school, the serious student quietly teased the strict teacher.”
(Mitsugi & MacWhinney, 2016, p. 23)8
Of interest was whether L1 and L2 Japanese speakers would use the case mark-
ers, which appear as postpositions in the noun phrases, to assign thematic roles
in real time. The two sentences were paired with the same four-image display
(see Figure 4.10, for a reconstruction); however, only the ditransitive sentence
FIGURE 4.10 Display used in a morphosyntactic prediction experiment. Each image

contained an agent, a recipient, a theme, and a distractor. Listeners were
expected to assign thematic roles incrementally by relying on case
marking on the nouns.
(Source: Mitsugi and MacWhinney, 2016. Recreated with permission from the author).
enabled listeners to anticipate the theme (e.g., the exam paper) as the third
verbal argument in the sentence based on the agent-goal combination. Mitsugi
and MacWhinney (2016) and especially Mitsugi (2017) found that L1 Japanese
speakers used case markers incrementally and predictively (cf. Kamide et al.,
2003); that is, they did not wait until the verb to build a sentence structure.
Third- and fourth-year university students of Japanese were delayed in their
processing and did not generate predictions, perhaps because they did not have
the time to do so.
A recurring theme in the L2 prediction literature is that listeners need accurate
linguistic knowledge and fast processing skills to make predictions. Because the
visual world paradigm emphasizes real-time, meaning-focused processing, Suzuki
(2017) and Suzuki and DeKeyser (2017) argued that prediction is a reflection
of learners’ implicit knowledge (also see Andringa & Curcic, 2015; Godfroid &
Winke, 2015). Implicit knowledge can be deployed rapidly and without awareness
(e.g., Williams, 2009) and because of this, it is often regarded as key to communi-
cative competence. Using data from confirmatory factor analysis and individual
differences measures, Suzuki and Suzuki and DeKeyser showed patterns of asso-
ciation between prediction in the visual world paradigm and other measures of
implicit knowledge and implicit learning aptitude. The researchers thus went one
step further than previous authors in arguing, not only that prediction reflects
linguistic knowledge, but specifying that knowledge as unconscious-implicit in
nature (contrast with Huettig, 2015). Future researchers will need to confirm
or disconfirm Suzuki’s (2017) and Suzuki and DeKeyser’s (2017) evidence, for
instance by triangulating visual world data with verbal measures, to probe L2
speakers’ awareness and strategic processing more directly. No doubt the question
of what drives prediction in L2 processing will invite more research in the years
to come (also see Section 4.2.2.1). Table 4.2 summarizes the main questions that
have guided L2 morphosyntactic prediction research with eye tracking.
4.2.2.4 Prediction Using Multiple Cues

In visual world research, discourse effects have been introduced through the
composition of the visual display and the sentence context. Researchers have
shown that native listeners can interpret incoming linguistic information accord-
ing to the broader discourse situation (e.g., Ito & Speer, 2008; Sedivy, 2003;
Sedivy, Tanenhaus, Chambers, & Carlson, 1999). Prediction in context, then, can
serve as a test of successful integration and real-time interpretation of multi-
ple, linguistic and nonlinguistic, cues. Modern generative theory predicts that
linguistic phenomena requiring integration of multiple cues—known as inter-
face phenomena—will present persistent difficulties for late L2 learners and
L1 speakers experiencing attrition (Rothman, 2009; Sorace, 2005, 2011). To date,
only one visual world study has examined this issue, with heritage speakers as a
target group (Sekerina & Trueswell, 2011), and findings strongly supported the
notion that linguistic constructions at the syntax-discourse interface are difficult

to process (see online supplementary materials, Table S4.4).
Sekerina and Trueswell (2011) recruited Russian monolinguals in Russia and
heritage Russian bilinguals in the United States.The heritage speakers had immi-
grated to the United States as children or adolescents and, while they mostly used
English in public life now, they reported being advanced speakers of Russian, their
home language. Both groups acted out spoken instructions to move objects on
a vertical board with nine (3 × 3) different slots. The composition of the board
changed between trials. In critical one-contrast trials, participants saw two objects
that contrasted in color (e.g., a red star and a yellow star), a color competitor (e.g.,
a red bird), and two distractor objects. Encoded in this display, therefore, was a
color contrast (i.e., red star–yellow star), which was a discourse manipulation and
a hypothetical cue for real-time sentence interpretation.
When presented with a Russian instruction that translates as Red put the …,
“Put the red …”, listeners faced a temporary ambiguity—was it the red star or
the red bird they had to move? With the integration of other cues from the dis-
course (i.e., the visual display), the syntax (i.e., the fronting of red in the spoken
instruction), and prosody (i.e., the stress on red in one experimental condition),
listeners could infer a contrastive interpretation very quickly (i.e., the RED star,
not the yellow star) and look at the target referent before they heard star. This is
indeed what Russian monolinguals did. Heritage speakers, on the other hand,
were remarkably slow in processing the Russian sentences, although they did
process them correctly in the end. Quite strikingly, in the early phases of the sen-
tence, they adopted a “wait-and-see strategy” (p. 294) whereby they hardly looked
at the target or color competitor at all, but kept their eyes fixated on the central
cross instead. It appears then that Sekerina and Trueswell (2011) identified a clear
vulnerability in heritage speakers’ L1 processing, who otherwise had high self-
reported Russian comprehension skills. Table 4.2 summarizes the main question
that has informed prediction research involving multiple cues.
4.2.2.5 Effects of Instruction
A total of four studies have examined whether instruction can influence real-time,
predictive language processing and the retrieval of lexical knowledge (see online
supplementary materials, Table S4.5). Given that prediction reflects linguistic
knowledge (see Section 4.2.2.3), adding an instruction component to a prediction
study can reveal whether prediction can be trained, how prediction develops over
time with exposure to input, and whether explicit instruction can speed up the
process of learning to predict. Understanding the effects of instruction on real-
time language processing is both theoretically and practically important. Research
on instruction can inform the interface hypothesis—that is, how explicit instruc-
tion, explicit knowledge, and implicit knowledge are related (Andringa & Curcic,
2015)—the relationship between production and comprehension (Hopp, 2016),
and L2 vocabulary learning and teaching (Bolger & Zapata, 2011; Kohlstedt &
Mani, 2018).
Visual world studies that focus on instruction generally consist of a learning
phase and a testing phase. During training, the participants first learn the targeted
grammatical (Andringa & Curcic, 2015; Hopp, 2016) or lexical (Bolger & Zapata,
2011) knowledge. Then, they take part in a visual world experiment, which is
the testing phase and measures the outcomes of training on the participants’ real-
time language processing (also see Section 3.2.3, for similar eye-tracking research
with text). Taking a slightly different approach, Kohlstedt and Mani (2018) inte-
grated both the learning task and the outcome measure into the same visual world
experiment and, in doing so, they were able to obtain fine-grained information
about the participants’ learning trajectory.
Andringa and Curcic (2015) were interested in the extent to which metalin-
guistic information (i.e., the provision of a grammar rule) could enhance implicit
processing of a morphosyntactic structure in Esperanto, an artificial language.The
authors found that neither implicit instruction in Esperanto (listening to sen-
tences that contained the target structure) nor a combination of implicit and
explicit instruction (listening + rule provision) resulted in predictive processing,
although the explicit instruction group, who were taught the rule, did perform
better on a separate measure of explicit knowledge. In contrast, Hopp (2016)
reported that intermediate-level English learners of German were able to make
gender-based predictions after explicit vocabulary instruction (also see Grüter et
al., 2012; Lew-Williams & Fernald, 2010). The participants listened to, saw, and
repeated article-noun combinations three times (e.g., der Käse “theMASC. cheese”)
and then produced the article-noun combinations themselves before they took
part in a visual world post-test. Prediction at post-test correlated with accuracy in
the production task, which pointed to a close association between production and
online comprehension (also see Hopp, 2013).
One difference between Andringa and Curcic (2015) and Hopp (2016) is par-
ticipants’ prior familiarity with the target language and vocabulary. Different from
the German learners in Hopp (2016), participants in Andringa and Curcic (2015)
had no prior exposure to Esperanto. Simply comprehending the sentences may
thus have placed a heavy burden on their working memory, leaving no room
for morphosyntactic prediction, even if participants knew the rule. To test this
hypothesis, researchers could adopt Andringa and Curcic’s target structure (dif-
ferential object marking) and test it in Spanish, which has the same grammatical
structure (Andringa & Curcic, 2016).
Two studies investigated lexical processing or vocabulary learning. Bolger
and Zapata (2011) were interested in how presenting new words in semantically
related or semantically unrelated story contexts influences vocabulary learning.
The authors hypothesized, based on previous research, that grouping words in
semantic sets (e.g., all terms for food, types of animals, or colors) might inhibit
vocabulary learning due to overly strong connections between the words. This is
known as semantic-category entanglement. Results showed that the group

who had learned the words in semantic sets looked longer at semantic competi-
tors (e.g., longer at pig for the target floop, “dog”), consistent with this view. The
result suggests greater caution should be taken when grouping novel vocabulary
items by their semantics, which is a widespread pedagogical practice in vocabulary
instruction, and even perhaps teaching vocabulary by themes (e.g., a visit to the
veterinary or a day at the beach). Even so, differences in learners’ eye-tracking data
were subtle and “the two groups were probably more alike than different” (p. 637).
Also using short story contexts like Bolger and Zapata (2011), but this time
in the auditory modality, Kohlstedt and Mani (2018) examined the role of dis-
course context on learners’ inference of new word meanings. Participants listened
to biasing (informative) and neutral story contexts in which a familiar prime
(e.g., Opa, “grandfather”) and a target word (e.g., Ausfrieb, a German pseudoword
for “cane”) appeared twice. The target (but not the prime) was depicted on the
screen (see Figure 4.11). The researchers found that in the latter part of the story,
when listeners had listened to more of the story context, both L1 German and
advanced L2 German speakers were able to infer the intended meaning of the
target word. Specifically, they made anticipatory looks to the target (e.g., the cane)
when hearing the prime (e.g., Opa). Given that participants were learning these
new word meanings on the fly, the current study is a nice example of how the
visual world paradigm can also be utilized for learning experiments.
FIGURE 4.11
Display used in a vocabulary learning experiment. In biasing story
contexts, the prime word Opa,“grandfather”, invited looks to the image
of the cane (Gehstock or the pseudoword Ausfrieb in German), which
was the target word in the story.
(Source: Kohlstedt and Mani, 2018).
In sum, instruction research in the visual world paradigm has shown the ben-
efits (Hopp, 2016; Kohlstedt & Mani, 2018) and limits (Andringa & Curcic, 2015;
Bolger & Zapata, 2011) of input and instruction on real-time language processing.
Because online processing in the visual world paradigm “does not readily allow
for the application of explicit knowledge” (Andringa & Curcic, 2015, p. 237), the
paradigm allows for the testing of theoretically interesting questions such as the
interface hypothesis. Findings may also have pedagogical implications, for instance
for L2 vocabulary instruction, by demonstrating the benefits of clustering vocabu-
lary thematically, rather than semantically, and embedding it in rich discourse
contexts. Research also benefits from the triangulation of online eye-tracking data
with offline measures, such as grammaticality judgment tests or vocabulary tests, as
participants’ performance may vary dramatically in these contexts. Table 4.2 sum-
marizes the main questions that have guided research on instruction in the visual
world paradigm and all other areas of prediction research.
4.2.3 Referential Processing
Four studies have investigated real-time sentence interpretation in ambiguous or
semantically complex sentences (see online supplementary materials, Table S4.6).
Compared to the prediction research reviewed previously (see Section 4.2.2), this
work represents somewhat of a conceptual shift, because the focus is no longer on
listeners’ anticipation of upcoming linguistic information. Instead, many research-
ers examine how listeners establish reference (linking language with the outside
world) when more than one potential referent is given. For example, the word frog
may refer to one of two frogs shown on the screen or the pronoun he may refer to
one of two male characters. To establish reference, listeners will normally need to
hear the critical word (e.g., frog, he) in the input first; that is, there is typically no
anticipation of the referent. The time windows used for analysis will be defined
accordingly: they will either align with or follow—but not typically precede—the
onset of the critical word (see Section 6.3.2.2). Secondly, work on referential pro-
cessing relies on additional tasks (see Section 5.4), such as moving an object (Kim
et al., 2015; Pozzan & Trueswell, 2016) or answering a comprehension question
(Cunnings et al., 2017; Sekerina & Sauermann, 2015). These tasks are a key com-
ponent of this research: by comparing participants’ eye movements with their final
decision (revealed in the additional task) researchers can determine the extent
to which the processes and product of sentence interpretation are in agreement.
As Tanenhaus and Trueswell (2006) noted, “introducing a referential world that
is co-present with the unfolding language, naturally highlights … questions about
reference” (p. 883). One such question is how listeners parse sentences in the
context of referential ambiguity, for instance when there are two potential refer-
ents (e.g., two frogs or two male characters) on the screen. This question inspired
pioneering research on syntactic ambiguity resolution with adult L1 speakers by
Tanenhaus et al. (1995), whose study was reviewed in Section 4.1. This work has
since been extended to child L1 speakers (Trueswell et al., 1999) and, of relevance
here, adult L2 speakers (Pozzan & Trueswell, 2016).
Pozzan and Trueswell (2016) compared L1 Italian–L2 English speakers’ parsing
skills to those of child L1 English speakers who participated in a previous study
by Trueswell et al. (1999). Both participant groups were language learners; how-
ever, only the adults in Pozzan and Trueswell’s study had fully developed execu-
tive skills, which could potentially help them recover from a syntactic ambiguity.
This is the hypothesis the researchers wanted to test. The participants acted out
spoken instructions such as Put the frog on the napkin onto the box.This sentence is a
garden-path sentence: listeners are led to believe on the napkin is the goal of the action
until they hear the second prepositional phrase and revision is necessary (the
frog, which is on the napkin, goes on the box). When only one frog (one poten-
tial referent) was present on the screen, L2 English speakers selected the wrong
goal (i.e., the napkin) in nearly half of the trials, showing they had difficulty in
updating their initial interpretations, just like native children. Revision difficulties,
therefore, are partly a learning phenomenon (also see Cunnings et al., 2017) that
is attested across learners with differing levels of cognitive maturity.
Another question is how listeners link pronouns to their antecedents to estab-
lish co-reference within a sentence or between sentences. In a sentence such as
Before Lizzi drives to East Lansing, shei takes heri dog for a walk, the pronouns she and
her and the proper name Lizz are co-referential: they all refer to the same real-life
person. Kim et al. (2015) and Cunnings et al. (2017) used the visual world para-
digm to study co-reference in L2 speakers, with interesting results. Cunnings et
al. (2017) examined the effects of L1 background on subject pronoun resolution.
Participants in the study were L1 Greek–L2 English speakers, L1 English speak-
ers, and L1 Greek speakers, who listened to sentences in English or Greek such
as the following:
(3) (a) While Peter helped Mr Smith by the sink in the kitchen, he carefully
cleaned the big cup that was dirty.
(b) While Mr Smith helped Peter by the sink in the kitchen, he carefully
cleaned the big cup that was dirty.
In a null subject language such as Greek, the overt pronoun aftós, “he”, in the
main clause indicates a shift in topic (i.e., from subject to direct object); hence
aftós normally refers to Mr. Smith in (3a) and Peter in (3b). In English, the inter-
pretation is reversed, given that pronouns more commonly refer to the cur-
rent discourse topic (typically the subject). The question then becomes whether
L1-Greek L2-English speakers will process English sentences according to Greek
grammar, English grammar, or a hybrid of the two. Eye fixations to images
of Peter and Mr. Smith, the two possible antecedents depicted on the screen
(see Figure 4.12), revealed listeners’ evolving preferences for either a subject or
an object antecedent for the he/aftós pronoun. Like English monolinguals, and
FIGURE 4.12
Display used in a referential processing study. Participants heard
sentences such as While Peter helped Mr Smith by the sink in the kitchen,
he …, in which the referent of the personal pronoun he was ambiguous.
(Source: Image supplied by Dr. Ian Cunnings, University of Reading, UK; Cunnings et al., 2017).
different from Greek monolinguals, Greek-English bilinguals initially linked the

English pronoun to the subject antecedent, displaying “nativelike interpretive
preferences” (p. 630). Interestingly, when the visual context biased interpreta-
tion in a non-preferred manner, L1 and L2 English speakers showed 66–79%
error rates on comprehension questions, even though their eye gaze had shifted
correctly during listening. These results highlight divergences between online
and offline sentence interpretation, and the need to measure both (also see
Pozzan & Trueswell, 2016; Roberts, Gullberg, & Indefrey, 2008), as initial inter-
pretations may persist in spite of attempts at reanalysis.
Using different sentence structures, however, Kim et al. (2015) did find qual-
itative and quantitative differences between L1 and L2 English speakers. They
attributed their findings to L2 speakers’ difficulties in integrating syntactic and
discourse-level information (also see Section 4.2.2.4). Clearly, then, pronoun res-
olution is an area ripe for further research. Future work will benefit from a com-
bined use of eye tracking and offline measures, as exemplified in all the previously
mentioned studies.
Lastly, Sekerina and Sauermann (2015) examined Russian heritage speak-
ers’ interpretations of the universal quantifier (i.e., every) in relation to the visual
scene.The participants were heritage Russian-English bilinguals and monolingual
Russian controls, similar to the participant sample in Sekerina and Trueswell’s

(2011) study (see Section 4.2.2.4). The heritage bilinguals performed the task in
both languages, Russian and English. They performed a sentence-picture veri-
fication task whereby they judged, for each item, whether a spoken sentence
that contained the universal quantifier every (or kahzdyj, in Russian) matched the
image they saw on the screen (see Figure 4.13). Extending previous research with
L1 children and adult L2 speakers, Sekerina and Sauermann found that heritage
speakers incorrectly rejected images of type B in Russian, their heritage language,
but not in English, their dominant language. Furthermore, the authors established
an “online signature pattern” (p. 96) of such errors in participants’ eye move-
ments, which might explain why and when comprehension broke down. Thus,
eye-movement data in this study helped the researchers “uncover a relationship
between visual attention and online language comprehension” (p. 87) and, specifi-
cally, identify cases where comprehension fails.
In sum, a small number of recent studies have grown into their own unique
strand of L2 visual world research, dedicated to questions about referential pro-
cessing. Although L2 speakers converge with L1 speakers in some conditions
(Cunnings et al., 2017; Kim et al., 2015), they experience difficulty in updating
or revising initial syntactic analyses (Cunnings et al., 2017; Pozzan & Trueswell,
2016). Second-language learners and heritage speakers also face challenges when
asked to integrate information from various sources, such as syntactic, discourse-
pragmatic, and visual information (Kim et al., 2015; Sekerina & Sauermann,
2015; but see Cunnings et al., 2017). In all cases, the triangulation of online data
(eye fixations) with offline measures (e.g., comprehension questions or mouse
clicks) has proved key to understand the process of interpretation in its full form.
FIGURE 4.13 Three experimental conditions in a sentence-matching task. Participants

saw one image per trial and judged whether the image matched the
sentence Every alligator lies in a/the bathtub. Note: the boxes represent
interest areas.
(Source: Reprinted from Sekerina, I. A., & Sauermann, A., 2015. Visual attention and quantifier-
spreading in heritage Russian bilinguals. Second Language Research, 31, 75–104, with permission from
SAGE Publications, Ltd. © 2014 The Authors).
TABLE 4.3 Questions in visual world eye tracking on referential processing
1. To what extent do listeners integrate syntactic information with information from

other linguistic and nonlinguistic domains during comprehension?
2. What is the role of general cognitive ability (e.g., executive function) in overcoming
syntactic ambiguity?
3. How quickly do learners establish co-reference in a spoken sentence? To what extent
can they revise an initial interpretation based on newly available information in the
input?
4. To what extent do L2 speakers process sentences in accordance with the grammar of
the target language, the native language, or a hybrid of the two?
5. To what extent do heritage speakers resemble L1 children, adult L2 speakers, and
adult monolinguals in their processing of quantifiers in the heritage language and the
dominant language?
Table 4.3 summarizes the main questions that have guided referential processing
research in the visual world paradigm.
4.2.4 Production
The synthetic review also yielded a total of six eye-tracking studies involving oral
production (see online supplementary materials,Table S4.7). Although these stud-
ies focus on speech production and interaction, which sets them apart from the
comprehension studies reviewed previously, the visual world paradigm and eye-
tracking research on production have “obvious similarities” (Huettig et al., 2011,
p. 152).Therefore, following Huettig et al. (2011), I too shall conclude the present
review with an overview of production research.
Eye-tracking research on L1 production has revealed a “tight temporal link
between eye movements and speech planning” (Huettig et al., 2011, p. 165). In
most production studies, participants are asked to describe a scene or name pictures
on the screen. Their eye gaze provides a measure of visual attention (see Sections
1.2 and 2.6), which reveals how the speakers extract visual information from the
display to fulfill their goals. Looking at early, preverbal stages of speech produc-
tion (i.e., message generation), Flecken and colleagues examined how L2 speakers
and bilinguals conceptualize events before and as they verbalize them (Flecken,
2011; Flecken et al., 2015). Lee and Winke (2018), working in child language
assessment, examined where English language learners direct their eye gaze dur-
ing speech disfluencies such as pauses and hesitation phenomena.9 Kaushanskaya
and Marian (2007) extended reading and listening research on the bilingual lexi-
con (see Sections 3.2.2 and 4.2.1) to speech production and investigated how
L1 orthography and phonology might interfere with L2 picture naming. Finally,
McDonough and her team (McDonough et al., 2015, 2017) looked at joint atten-
tion in interaction as a potential language learning mechanism. Together, these
studies have extended the use of eye tracking from language comprehension to
production, highlighting close links between a participant’s eye gaze and their
productive language processing.
Flecken and her colleagues conducted two cross-linguistic comparisons of
event conceptualization, which is how speakers segment and select information
to comprehend and interpret an event (Flecken, 2011; Flecken et al., 2015). This
line of work can inform the debate on language and cognition (for reviews, see
Lupyan, 2016; Zlatev & Blomberg, 2015)—namely, whether and to what extent lan-
guage-specific properties shape how humans conceptualize experiences. Different
from previous cross-linguistic eye-tracking research (e.g., Papafragou, Hulbert, &
Trueswell, 2008), Flecken and her colleagues focused specifically on bilinguals and
L2 speakers, whose event conceptualization can potentially be influenced by more
than one language.The participants in both studies viewed short video clips of eve-
ryday events, which they were asked to describe (see Figure 4.14, for an example of
a still image).The researchers analyzed participants’ verbal productions, for instance
for use of the progressive aspect (Flecken, 2011) or different kinds of motion verbs
(Flecken et al., 2015). They then related participants’ linguistic choices to their
eye fixations on different areas on the screen. Flecken et al. (2015) found that L2
German speakers with an L1 French background inspected scenes similarly to L1
French monolinguals. This suggested that although the advanced L2 speakers had
FIGURE 4.14
Motion event used in an oral production study. Participants viewed
a video clip of a pedestrian walking toward a car and were asked to
describe the event in French or German. Note: The boxes represent
interest areas.
(Source: Reprinted from Flecken, M., Weimar, K., Carroll, M., & Von Stutterheim, C., 2015. Driving
along the road or heading for the village? Differences underlying motion event encoding in French,
German, and French-German L2 users. The Modern Language Journal, 99, 100–122, with permission
from Wiley. © 2015 The Modern Language Journal).
acquired most of the lexical means for describing motion events in L2 German,
their event conceptualization still showed a strong L1 influence.
Moving beyond monologic tasks, McDonough et al. (2015, 2017) were among
the first in SLA and bilingualism to use eye tracking in face-to-face interaction
(also see Gullberg & Holmqvist, 1999, 2006). In two studies, McDonough and
colleagues focused on joint attention during interaction, which they defined as
“the human capacity to coordinate attention with a social partner” (McDonough
et al., 2017, p. 853) using visual cues such as gesture and eye gaze. The researchers
measured three kinds of joint attention—the L2 speaker’s self-initiated eye gaze,
their interlocutor’s other-initiated eye gaze, and mutual eye gaze (i.e., shared eye
contact) between both speakers (see Figure 4.15). In both studies, the research-
ers found that the length of L2 participants’ self-initiated eye gaze predicted
the outcome variable: a greater likelihood of responding correctly to feedback
(McDonough et al., 2015) and more pattern learning (McDonough et al., 2017).
McDonough et al. (2015) also found a positive effect of mutual eye gaze.
These studies have thus begun to illuminate the role of a nonverbal cue in
language learning and strongly invite further research along the same lines, for
instance on gestures (see Section 9.3.1, research ideas #7 and #8). In light of the
naturalistic tasks and high ecological validity of this work, I see great potential
in the application of eye-tracking methodology to interaction research (also see
FIGURE 4.15
Eye-tracker set up in an oral production study. Second-language
learners were paired up with a research assistant to perform one-on-
one interactive tasks while eye-tracking cameras recorded their eye
movements.
(Source: Image supplied by Dr. Dustin Crowther, University of Hawai’i).
TABLE 4.4 Questions in visual world eye tracking on production
1. To what extent do the language(s) that people use shape their perception and
interpretation of events?
2. What are the visual markers of speech disfluencies in language assessment?
3. What are the consequences of having an integrated bilingual lexicon for L2 speech
production? Do the orthography and phonology of the not-in-use language (the L1)
interfere with speech production (picture naming) in the other language?
4. To what extent does interlocutors’ eye gaze, as an index of joint attention, relate to
successful interaction and the initial stages of L2 grammar learning?
Brône & Oben, 2018). Table 4.4 summarizes the main questions that have guided
eye-tracking research on L2 production.
4.3 Conclusion
This chapter has offered a bird’s eye view of the 32 visual world studies pub-
lished in SLA and bilingualism between 2003 and 2017. Complementing the
review of the 52 text-based studies in Chapter 3, this chapter showcases how the
visual world paradigm can help advance different research areas.Visual world eye
tracking has grown into a full-fledged paradigm for studying spoken language
processing. This is a non-trivial matter given that many other measures of spo-
ken language processing are metalinguistic in nature, provide only a snapshot of
processing, rather than full time-course data, and may interrupt the speech input
(Tanenhaus & Trueswell, 2006). Through a thoughtful integration of visuals and
spoken language, researchers can study questions from the lowest, sublexical lev-
els, to word recognition, (morpho)syntax, and semantics, all the way up to the
discourse level. The paradigm places questions about the temporal and referential
aspects of spoken language processing front and center (see Section 4.1). It has
also provided key data on the rapidly expanding area of prediction in language
processing (see Sections 4.1 and 4.2.2.1). Bilingualism and SLA researchers have
embraced these methodological affordances in their research, with valuable results.
At this juncture, it is good to look back to the ground covered and look ahead
to what the paradigm may bring to SLA and bilingualism in the following years.
Compared with text-based research, the use of the visual world paradigm in
SLA and bilingualism research is a more recent development, and especially so
in SLA. The present synthetic review revealed a strong thematic overlap with
research in psychology, potentially reflecting the origins of the paradigm in
this neighboring discipline. Research questions addressed so far fall under four
rubrics—word recognition, prediction, referential processing, and production—all
of which are general themes in language and cognition (see Sections 4.2.1–4.2.4).
Bilinguals and L2 speakers are interesting study populations to include in these
investigations, due to the potential cross-linguistic influence they experience in

their lexical and grammatical systems, their generally slower lexical processing,
and their putative advantages in terms of executive control. Exemplary work in
this area has uncovered both continuity (e.g., Cunnings et al., 2017; Dijkgraaf et
al., 2017; Ito et al., 2018) and differences (Mitsugi, 2017; Mitsugi & MacWhinney,
2016; Pozzan & Trueswell, 2016; Sekerina & Trueswell, 2011) in how bilinguals
and monolinguals engage in language processing tasks.
For SLA researchers, it will be important to continue developing new appli-
cations of visual world eye tracking that can help address big questions in SLA.
Eye fixations provide a real-time window into spoken language processing. How
does such processing change over time, as L2 proficiency develops, and can
instructional interventions effectively speed up this developmental trajectory?
What linguistic and cognitive factors determine whether L2 speakers and bilin-
guals display anticipatory behavior, which seems to be the benchmark for skilled
processing that underlies about half of all visual world studies? (see Section
4.2.2). Research in this area will benefit from strong theoretical foundations
regarding what a visual world task can measure. In recent years, researchers have
advanced proposals that prediction in a visual world experiment reflects the
deployment of implicit knowledge, or knowledge outside the learner’s aware-
ness (Andringa & Curcic, 2015; Godfroid & Winke, 2015; Suzuki, 2017; Suzuki
& DeKeyser, 2017). Given the need for more and better measures of implicit
knowledge in the field (e.g., Godfroid, in press), the idea that prediction reflects
implicit knowledge is worthy of further research, especially considering the fact
that there might also be multiple mechanisms (beyond simple associative learn-
ing) that can give rise to predictive behavior (Huettig, 2015). By the same token,
the field needs more studies triangulating production and comprehension data
(Grüter et al., 2012; Hopp, 2013, 2016), online and offline data (e.g., Andringa
& Curcic, 2015; Grüter et al., 2015; Mitsugi & MacWhinney, 2016), and more
studies that investigate authentic human interactions involving L2 speakers
(McDonough et al., 2015, 2017).
Further innovation along these and other lines will help solidify the place of
visual world eye tracking in SLA. By informing key questions in SLA, regarding
the role of instruction, the nature of linguistic knowledge, individual differences,
and many others, visual world researchers will not only advance disciplinary
knowledge but also help build the identity of the L2/bilingual eye-tracking com-
munity as a mature field with a broad, transdisciplinary reach. As in Chapter 3,
the goal of this chapter has been to inspire readers’ own research projects. With
this goal in mind, I focused primarily on the general questions in which research-
ers are interested and how they approached these questions with eye tracking.
Together, Chapters 3 and 4 have given readers a survey of what has been done,
which hopefully has sparked some ideas for their own research projects. In the
remainder of the book, I will focus on how to develop your very own eye-
tracking experiments.
Notes
1 Two common reading behaviors, skips and regressions, both cause readers to depart
from a strictly linear, word-by-word reading pattern.
2 In these experiments, the nouns that are underlined were depicted as images on the
screen.This enabled the researchers to contrast looks to the different objects (e.g., bread
versus man or motorbike versus carousel) as a result of the different verb types.
3 The difference in looks in the future-tense condition was only statistically significant
in a second, revised version of the experiment.
4 Studies that were available online first in 2017 had a 2018 publication date.
5 To assess the degree of overlap between the two research communities, I conducted
an additional search in three psychology-oriented journals: Journal of Memory and
Language, Cognition, and Language, Cognition, and Neuroscience. This yielded 15 visual
world studies with bilinguals. The topics covered ranged from the bilingual lexicon, to
morphosyntactic prediction, and production. These strands appear to be similar to the
present synthetic review of SLA and bilingualism research and will be covered in the
remainder of this chapter.
6 Gender-based prediction refers to participants’ use of gender-marked cues (articles or
adjectives) in the input to anticipate which noun is likely to come next, for instance
looks at a shoe, zapatoMASC in Spanish, based on the preceding article elMASC (cf. Grüter,
Lew-Williams, & Fernald, 2012).
7 More formally, gender agreement occurs between a trigger (generally a noun) and mul-
tiple targets (e.g., articles, adjectives, pronouns, demonstratives, and past participles). In
visual world research, it is the article-noun and article-adjective-noun relations that
have been studied the most.
8 The abbreviations in subscript refer to case, expressed in Japanese through case mark-
ers after the noun. LOC = locative case, NOM = nominative case, DAT = dative case,
ACC = accusative case.
9 Readers interested in Lee and Winke’s (2018) study are referred to the assessment
strand in Section 3.2.5.
5
GENERAL PRINCIPLES OF
EXPERIMENTAL DESIGN
There are many roads that can lead a researcher to engage in eye tracking. Some
readers of this book will have conducted multiple studies before turning to eye-
tracking research. They will have extensive experience with experimental design
and research methodology but may be new to the particulars of eye tracking.
Other readers will be relatively new to the process of experimental research. If
this is you, you may find yourself needing to learn basic principles of experimen-
tal design, as well as information and guidelines specific to eye tracking. If that
is the case, this chapter is just right for you. In this chapter, I set the stage for the
eye-tracking-specific guidelines that will come next, in Chapter 6. Starting from
what an item is (see Section 5.1), I will describe the creation of item lists for within-
and between-subjects research designs (see Section 5.2), the different types of trials
within a study (see Section 5.3), the distinction between primary and secondary
tasks (see Section 5.4), and, finally, give guidelines for how many items per condi-
tion to include (see Section 5.5). Having a solid grasp of these different concepts
will make you a better quantitative researcher. It will also make you a better
eye-tracking researcher because good eye-tracking research builds upon general
principles of experimental design (see, e.g., Kerlinger & Lee, 2000, for a stand-
ard methodological reference). Therefore, let us take a methodological excursion
together and explore the basics of experimental design.
5.1 Doublets, Triplets, and Quadruplets

A key component of designing a study is the creation of experimental materials, also
called stimuli or items. Experimental materials are the fuel to a study engine; without
them, the study cannot be run. Experimental materials is a general name we use
for the information with which research participants engage during the experiment,
and for which their responses will be recorded and then analyzed. In eye-tracking
General Principles of Experimental Design 127
experiments, the materials can be sentences (for reviews, see Keating & Jegerski, 2015;
Roberts & Siyanova-Chanturia, 2013), images, videos, webpages, or other types of
visual displays, depending on the nature of the study. To a large extent, experimental
materials shape the experience participants will have during a study.
An important step, therefore, is designing sound and reliable materials for the
study. At first blush, this may seem to boil down to compiling or creating a long list
of materials. Although finding or creating materials is generally a part of the process,
in practice, a researcher’s job seldom ends here. Researchers usually create multiple
versions of the same materials, manipulating one or more aspects of the item to see
how this change affects the outcome (dependent variable). This process of creat-
ing multiple versions is referred to here as reduplication. Together, the different
versions of an item represent the independent variables that are of interest in a
study. Descriptions of how to create multiple versions apply specifically to categori-
cal variables; that is, variables with a limited set of possible values, such as input
enhancement (enhanced, unenhanced); word status (word, pseudoword); article (def-
inite, indefinite); feedback type (implicit, explicit, no feedback); or task complexity
(simple, complex).Therefore, reduplicating items, as described in this section, is most
common in ANOVA-type designs, in which researchers manipulate a limited num-
ber of categorical variables.The guidelines also apply to regression analysis and other
correlation-based designs, as long as the set of independent variables includes at least
one categorical variable. In the following, we will see many different examples of
how researchers map categorical variables onto different item versions. For a review
of the different variable types referred to in this chapter, see Textbox 5.1.
TEXTBOX 5.1. TYPES OF VARIABLES

Variables can be independent or dependent:
Independent variable: the variable that is changed (manipulated) or observed

in the experiment, for instance text spacing (spaced text vs. unspaced text)
Dependent variable: the outcome variable for which researchers measure
how its values change as a result of the independent variable, for instance
reading time (how does reading time change as a result of text spacing?).
Researchers want to avoid or minimize the influence of any additional, unac-

counted-for variables that might confound the results.
Confounding variable: any variable that is associated with both the inde-
pendent and the dependent variable and is not accounted for in the
research design or statistical analysis. Confounding variables bias experi-
mental results and can undermine the validity of a study, for instance
text difficulty (do the two texts differ in other ways than the presence or
absence of word spacing?).
128 General Principles of Experimental Design
Variables can also be continuous or categorical:
Categorical variable: any variable that can take a limited number of pos-
sible options or values, such as feedback type. Each value represents a dis-
tinct category in the world that can be described qualitatively, for instance
recasts, prompts, and no feedback. Examples: text type (spaced vs. uns-
paced), feedback (recasts vs. prompts vs. no feedback)
Continuous variable: any variable that can be measured quantitatively on
a continuum. There could, in theory, be any number of values. Examples:
reading time, reaction time.
Say you are interested in whether L2 Chinese learners can read spaced text faster
than unspaced text (Chinese text is normally unspaced). To investigate this, your
study should naturally contain sentences with spacing and sentences without.
Ideally, however, the spacing manipulation should be applied to the same set of
sentences, so other factors (confounding variables) such as the words used in
the sentence and the overall sentence difficulty do not play a role. Simply put,
each sentence should have both a spaced and an unspaced version.The number of
versions of a stimulus (in this case, a sentence) depends on the variables that are of
interest in the study. As a rule, there should be as many item versions as there are
levels in your categorical variable. For instance, in the spacing study, the categori-
cal variable (spacing) has two levels: spaced text and unspaced text. Hence, there
should be two versions of each sentence.
Although there is no theoretical limit on how many levels a variable can
have, for practical reasons most variables tend to have two, three, or four levels.
Accordingly, the items used to measure them will be organized into doublets
(two levels), triplets (three levels), or quadruplets (four levels). Figure 5.1 provides
a graphical representation of the most commonly used item types. Doublets
are pairs of stimuli that represent a two-level categorical variable. Triplets are
groups of three of the same stimulus that represent a three-level categorical vari-
able. Lastly, quadruplets are four different versions of the same stimulus which,
together, represent a four-level categorical variable or the combination of two
independent variables (both categorical) with two levels each. To illustrate how
this works in an actual study, I will walk you through some examples modified
from existing text-based and visual world studies.
Godfroid, Ahn, Rebuschat, and Dienes (in prep.) wanted to investigate
the acquisition of L2 syntax by L1 English speakers. They conducted an eye-
tracking experiment based on Rebuschat’s (2008) semi-artificial language (also
see Rebuschat & Williams, 2012), in which English words have been rearranged
according to German syntax (e.g., Yesterday scribbled David a long letter to his family,
mirroring the German sentence Gestern kritzelte David einen langen Brief an seine
FIGURE 5.1 Common item types: doublets (two levels), triplets (three levels), and
quadruplets (four levels). An item should have as many versions as there
are levels in your independent variable.
Familie). In order to obtain baseline reading data, they also ran a control condition
in which participants read the same sentences with normal English word order
(e.g., Yesterday David scribbled a long letter to his family). The independent variable
in this study was word order. It had two levels: German word order and English
word order. Accordingly, the experimental items were doublets—sentences were
presented in either German syntax (verb-second word order, experimental group)
or English syntax (subject-verb word order, control group). A third condition
(not included in Godfroid et al.’s study) could examine the acquisition of verb-
final word order found in Subject-Object-Verb languages such as Korean and
Japanese (e.g., Yesterday David a long letter to his family scribbled). Doing so would
simply mean adding a branch to the item tree, so we now have triplets instead of
doublets.
A plausible follow-up to this study could look at the role of instruction. Might
bolding and underlining the verbs help learners acquire the structure? To address
this question, the researchers would introduce a new variable into the design,
namely input enhancement. We can represent this move by adding a new level to
the item tree (see Figure 5.2). Because input enhancement is normally operation-
alized as a two-level independent variable (enhancement, no enhancement), each
item version should branch two-ways. Thus, doublets become quadruplets and
triplets would become sextuplets (see Figure 5.2).
The principles of stimulus reduplication apply to text-based and visual-world
eye-tracking research alike (see Chapters 3 and 4, for reviews). They also apply to
the different strands of research within these broad paradigms, with the excep-
tion of observational research (e.g., Godfroid et al., 2018; Lee & Winke, 2018;
McCray & Brunfaut, 2018; McDonough, Crowther, Kielstra, & Trofimovich, 2015;
McDonough, Trofimovich, Dao, & Dion, 2017). Therefore, as long as researchers
FIGURE 5.2 Schematic representation of different item types.

(Source: Example modified from Godfroid et al., in prep.).
are manipulating task or text properties, using ANOVA or regression with dummy
variables, they will benefit from applying the same manipulation to the same stimuli.
This appears straightforward for sentence-processing research, such as the grammar
study described previously. However, it is worth emphasizing that “cloning” stimuli
is not the prerogative of sentence-processing researchers. Researchers who work
with picture prompts, videos, or larger instructional materials can also benefit from
creating two (or more) versions of the same stimulus. Doing so will earn them
better experimental control and a study that has higher internal validity (i.e., the
study findings reflect what the researchers intended to study).
As seen in Chapter 4, researchers in the visual world paradigm utilize auditory
input in conjunction with images. This opens up additional possibilities for mate-
rials design besides the manipulation of linguistic input on which we have focused
thus far. Specifically, visual world researchers can manipulate linguistic-auditory
and visual input sources independently, which gives them more options when
creating materials. Even though there are more options theoretically, oftentimes
the research questions will guide visual world researchers in what to do. Broadly
speaking, there are three ways to go about creating items for a visual world exper-
iment: (1) manipulate linguistic-auditory stimuli while keeping the visuals con-
stant, (2) manipulate visual stimuli while keeping the linguistic-auditory stimuli
constant, and lastly (3) manipulate both the visual and the linguistic-auditory
stimuli. To illustrate these approaches, I will introduce three representative studies
from across the spectrum of visual world research (also see Section 6.3.1.1).
Kohlstedt and Mani (2018) examined L1 and L2 speakers’ ability to infer the
meaning of unknown words from a spoken story context (see Section 4.2.2.5).
The authors compared the processing of words and pseudo words in biasing
(semantically informative) and neutral contexts, for instance the target word
cane paired with a story about a grandfather (semantically biasing) or a closet
(semantically neutral).There were two independent variables in this study, context

and word, and each variable had two levels, biasing vs. neutral (context) and real
vs. pseudo (word). Similarly to the Godfroid et al. (in prep.) example with added
input enhancement, this design called for quadruplets: participants listened to four
different versions of a short story (words in a neutral context, words in a biasing
context, pseudo words in a neutral context, pseudo words in a biasing context).
Importantly, these experimental manipulations affected the spoken story content
(i.e., the linguistic-auditory input) only; that is, participants listened to slightly
different versions of the same story, but the images on the screen remained the
same (see Figure 5.3).
A different approach was taken in an influential study by Marian and Spivey
(2003a, 2003b), who investigated parallel (dual) language activation in bilinguals
(see Section 4.2.1). Participants in this study were Russian-English bilinguals,
who were instructed to pick up real objects (e.g., Pick up the shovel) that were
laid out in front of them on a white board. The display changed in between trials.
Unbeknownst to the participants, it concealed a number of hidden properties of
the displayed objects (see Figure 5.3 and Textbox 6.1). In all the critical trials, the
visual display contained a target object (i.e., the object participants had to pick up).1
In the control condition, one of the four objects was the target object (e.g., shovel)
and the other three objects were distractors. In within-language competitor
FIGURE 5.3
A quadruplet. Four different versions of the audio were presented
together with the same display.
(Source: Kohlstedt and Mani, 2018).
trials, one of these three objects was an item of which the English name overlapped
with the target word (e.g., shovel - shark). Between-language competitor trials
included an object of which the Russian name overlapped with the target word
(e.g., shovel - xarik [SArik], “balloon”). Lastly, the simultaneous competition
condition included both Russian and English competitor objects.Thus, in all there
were four conditions. Hence, each target word was tested by means of a quadruplet.
The quadruplet was realized through the visual display, rather than the contents of
the spoken sentences as previously shown in the Kohlstedt and Mani (2018) exam-
ple. With such a design, Marian and Spivey (2003a, 2003b) were able to test, and
adduce evidence for, the co-activation of words in the bilingual lexicon, regardless
of whether the words belonged to the same language or a different language than
the spoken input.
Lastly, visual world researchers can combine the previous possibilities and
manipulate both visual and linguistic-auditory input when creating materials.This
is what Trenkic, Mirković, and Altmann (2014) did, in a study on L1 Chinese–L2
English speakers’ processing of definite (i.e., the) and indefinite (i.e., a) determin-
ers (also see Sections 4.2.2.3, 5.2, and 6.1.3.2). For the linguistic-auditory stimuli,
the researchers created two versions of each item, which makes for a doublet:
(1) Definite and indefinite articles

a. Definite condition:
The pirate will put the cube inside the can.
b. Indefinite condition:
The pirate will put the cube inside a can.
As for the visual-imagery stimuli, the authors used “semi-realistic scenes” (Huettig,
Rommers, & Meyers, 2011, p. 151) that contained a number of objects, including
the target object (see Figures 5.5 and 6.10). Each scene contained two identical
containers; however, the properties of the containers varied, so that in one ver-
sion, both objects were potential goal recipients (e.g. two open cans), while in the
other version, only one object was (e.g., one open and one closed can). Again, this
is a doublet because there were two versions of each image. Combined, these two
variables (definiteness and number of referents) resulted in a quadruplet, which
is shown in Figure 5.5. Trenkic et al.’s study will be returned to in Section 5.2, to
illustrate the construct of counterbalancing, and again in Section 6.3.1.1, when
we look at interest areas.
In sum, the process of creating materials is closely tied to how many categorical
variables there are in a study and how many levels each variable has. Doublets, triplets,
and quadruplets are a reflection of this. They are the product of how many values
each variable can take. Researchers can increase the level of experimental control in
their study through item reduplication, because all that differs between the different
versions of an item are the experimental conditions, and not some other properties.
Creating experimental materials, then, is part creativity and part labor.
FIGURE 5.4
A quadruplet. Four different versions of the display were presented
alongside the same auditory input, “Pick up the shovel”. Note: displays
recreated with images from the International Picture Naming Project.
(Source: Bates et al., 2003; Szekely et al., 2003; Marian and Spivey, 2003a, 2003b).
FIGURE 5.5
A quadruplet drawn from doublets of visual and linguistic-auditory
stimuli.
(Source: Trenkic et al., 2014).
5.2 Between- and Within-Subjects Designs

From a participant’s perspective, a lot of the work described in the previous sec-
tion takes place behind the scenes. Participants will not know there are multiple
versions of a single item because they will generally see only one version.2 In this
section, I introduce a basic distinction in research design that will determine how
to assign doublets, triplets, and quadruplets over different lists for participants—
the distinction between within- and between-subjects designs.
The distinction between within- and between-subjects designs boils down to
the following question: will participants experience only one level of the inde-
pendent variable (henceforth, one condition) or will they be exposed to all of
the levels? A study has a between-subjects design when individual participants
are assigned to a single condition: for instance, German syntax or English syntax
(Godfroid et al., in prep.), error correction, or metalinguistic feedback (Shintani &
Ellis, 2013), or different types of subtitles (Bisson,Van Heuven, Conklin, & Tunney,
2014). A study has a within-subjects design when individual participants expe-
rience all of the conditions that are of interest in the study: for instance, words
and pseudo words (Kohlstedt & Mani, 2018), all four kinds of phonological com-
petition (Marian & Spivey, 2003a, 2003b), definite articles and indefinite articles
(Trenkic et al., 2014), or simple and complex tasks (Révész, Sachs, & Hama, 2014).
TEXTBOX 5.2. MAJOR TYPES OF RESEARCH DESIGN

Between-subjects design: a study in which every participant is assigned
to a single experimental condition, e.g., participants in Group A watch
videos with full captions, participants in Group B watch videos with key-
word captions.
Within-subjects design: a study in which all participants are exposed
to all experimental conditions, e.g., all participants listen to stories that
contain pseudo words (condition 1) and stories that contain real words
(condition 2).
Mixed design: a study that contains both between- and within-subjects
variables, e.g., four groups of foreign language learners (L2, a between-
subjects variable) watch a captioned video on a familiar topic and a cap-
tioned video on an unfamiliar topic (topic familiarity, a within-subjects
variable).
Importantly, although some research questions clearly call for a between- or a

within-subjects design (and mixed designs also exist), in many cases researchers have
some flexibility in how they will implement their ideas. This is not a trivial matter.
Specifically, how researchers conceptualize their study will have a direct influence
on the stimulus lists that participants receive, the statistical analysis, and ultimately, the
chances of detecting statistical effects that are present in the data.A distinct advantage
of within-subjects designs is that every participant serves as his or her own control.
For instance, a motivated participant is likely to apply herself to all items equally,
meaning you could get a homogeneous set of data for your different experimental
conditions. Likewise, a less proficient L2 speaker will have the same language profi-
ciency throughout the study and can therefore best be compared to herself. When
researchers use a within-subjects design, they can control for individual differences
in participant performance. I would argue that no matter the type of research you
do, it is good to think about whether your study lends itself to a within-subjects
design. Many L2 researchers naturally lean toward between-subjects designs, but
with a few small tweaks, it may be possible to convert a between-subjects study into
a more controlled and statistically more powerful within-subjects experiment.
To compare these two options, we will take Montero Perez, Peters, and DeSmet’s
(2015) captions study as an example. Recall from Section 3.2.4 that Montero
Perez and her colleagues studied L2 French learners’ incidental vocabulary acqui-
sition from watching two captioned video clips, one on a LEGO© factory and
the other about a brewery in northern France. The authors wanted to know
whether keyword captioning (showing only keywords on the screen) would yield
higher vocabulary gains than the traditional, full captioning, given that keywords
are more salient. The study participants were randomly assigned to either the full
captioning or keyword captioning condition and viewed the two video clips with
the corresponding caption type. This is a between-subjects design. Although the
authors were able to demonstrate some benefits of keyword captioning in this way,
we do not know whether additional results would have emerged had the same
participants watched one video with keyword captions and the other video with
full captions. Using such a within-subjects design (see Figure 5.6), the researchers
could have controlled for their participants’ L2 proficiency level, listening com-
prehension, vocabulary size, language aptitude, motivation, stress levels and fatigue,
and any other individual differences that might influence the outcomes of the
study. Many studies use a between-subjects design like Montero Perez et al. (2015)
did (see online supplementary materials, Tables S5.1–S5.12). Therefore, the goal
in discussing this study is not to single this project out, but rather demonstrate
how, with a few small tweaks, we can take a good study and make it even better.
Lists and counterbalancing. To ensure that participants see only one of the
multiple item versions, researchers arrange their experimental materials in lists.
Each list contains one version of each item and each participant sees only one
of the various lists. This way, researchers can avoid repetition effects that would
come from watching the same video clip or reading the same sentence twice.3
The number of lists equals the number of item levels; in other words, doublets are
distributed across two lists and quadruplets require four lists. Going back to the
Montero Perez et al. (2015) example, the authors had two item lists as they had
two captioning conditions (Keyword and Full captions). For both between- and
within-subjects designs, creating lists is essential; however, the composition of the

lists will be different (see Figure 5.6). Between-subjects designs will have one con-
dition per list, which will be the same condition for the whole list. In comparison,
in within-subjects studies, all the conditions will be represented in every list. For
instance, when items are doublets, half the items in the list will be in condition A
and the other half will be in condition B.
To include all conditions in a list without repeating items, researchers rotate
the item versions (i.e., experimental conditions) across lists. This process is known
as counterbalancing. It is a key feature of most eye-tracking studies that have
a within-subjects design, even though it is not required (see Endnote 3). Simply
put, counterbalancing means that if a participant sees an item in version A, he or
she will not see the same item in version B. He or she will, however, see a differ-
ent item in version B. By using a complementary scheme for the next participant,
researchers can obtain a full data set for all items in all conditions. There will be
as many rotations as there are levels in your categorical variable. Thus, a doublet
requires two lists (see Figure 5.6, right panel) and two participants to obtain a
complete set of data. A quadruplet requires four lists and four participants to
obtain a full set of observations. Figure 5.7 presents the counterbalancing proce-
dure for quadruplets, using stimuli from Trenkic et al.’s (2014) study. As described
in Section 5.1,Trenkic and colleagues created quadruplets and therefore, they had
to counterbalance the different versions across four experimental lists. Arranging
FIGURE 5.6 Between-subjects design and within-subjects design for studying the

role of captions in vocabulary acquisition. Each participant will see
only one list.
(Source: Based on Montero Perez et al., 2015).
FIGURE 5.7 Counterbalancing items across four lists. Note: version a = one-referent

and indefinite condition; version b = two-referent and definite condition;
version c = one-referent and definite condition; version d = two-referent
and indefinite condition.
items across different lists so every group of participants sees every condition and
every item exactly once and there is exactly one observation for every item ver-
sion is known as a Latin square design (see Textbox 5.3).
TEXTBOX 5.3. COUNTERBALANCING AND LATIN

SQUARE DESIGN
A Latin square design is a special type of ANOVA that enables researchers to
derive the maximum possible amount of information from a limited set of
observations. One of the distinguishing features of a Latin square is the equal
number of conditions (i.e., item versions) and groups, which give the design
its square form. For instance, a study with a 4 × 4 Latin square design is a
study that has four conditions and four groups.
A Latin square is considered an incomplete design because each item is

presented to each group in one condition only (see Kutner, Nachtsheim,
Neter, & Li, 2005); that is, each item version will be seen by a different group
of participants. The net result is that after one full round of data collection,
every group has seen every condition and every item exactly once and there
is exactly one observation for each condition (for examples, see Figure 5.6,
right panel, and Figure 5.7).
5.3 Trials: Practice Trials, Critical Trials, and Filler Trials

Thus far, we have focused on experimental items, including their base form and item
variations, which are at the heart of most eye-tracking experiments. Experimental
items tend to occur alongside other repeating elements in the study, such as fixation
crosses (see Section 6.3.1.5), image preview (see Section 6.3.1.4), comprehension
questions or other types of questions, and participant responses in the form of mouse
clicks, looks, and typewritten or spoken answers (see Section 5.4, for more informa-
tion on secondary tasks). In other words, experimental items are a part of a larger
unit in the study, which is referred to as a trial. A trial is defined here as a sequence
of events that is repeated several times in a study, following a cyclical pattern, and for
which participant response data are collected for later data analysis. For instance, a
trial can be one sentence followed by a comprehension question, a sentence paired
with four pictures, a prime word followed by a target word, one screen in a video,
or a question on a listening test. The trial is the backbone of a research study. From
a reader’s perspective, figuring out the contents of a trial amounts to answering the
question of what, exactly, participants had to do in a particular study. This question
is deceivingly simple, yet it holds the key to fully comprehending a study and inter-
preting research findings accurately. Researchers, therefore, want to consider their
audience and describe this part of their study very carefully. Information about trial
contents can typically be found in the Methods section of an article. Let’s take a look
at two examples from Tremblay’s (2011) and Alsadoon and Heift’s (2015) studies.
In a visual world experiment, Tremblay (2011) examined how L2 French
learners parse words with liaison in spoken French (also see Section 4.2.1). Liaison
is a misalignment between word and syllable boundaries (Tremblay, 2011) that
occurs when a latent consonant at the end of a word is pronounced at the onset
of the next word (e.g., fameux élan, [fa.mø.z#e.lᾶ], “infamous swing”) because the
second word begins with a glide or a vowel. Here is how Tremblay described the
contents of a trial:
The experiment began with a calibration of the eye-tracker using the par-
ticipants’ right eye.This initial calibration was followed by a practice session
of ten trials and by the main experiment. In each trial, the participants saw
four orthographic words in a (non-displayed) 2 × 2 grid for 4,000 millisec-

onds. This long reading time ensured that the L2 learners would be able to
read each word before the onset of the auditory stimulus. The words then
disappeared and a fixation cross appeared in the middle of the screen for
500 milliseconds. As the fixation cross disappeared, the four words reap-
peared on the screen and the auditory stimulus was heard (synchronously)
under JVC HA-G101 headphones. The participants were instructed to
click on the target with the mouse as soon as they heard the target word in
the stimulus. The participants’ accuracy rates, their reaction times and their
eye movements were recorded, with the latter two being measured from the
onset of /z/ in the critical trials. (…) The trial ended with the participants’
response, with an inter-trial interval of 1,000 milliseconds.
(Tremblay, 2011, pp. 266–267)
When reading this description for the first time, you may find it helpful to create
a visual of the different events in the trial sequence (see Figure 5.8 and Section
6.3.1.5).This will help you keep track of the different parts in the study and figure
out their function. A good place to start is to locate the item, because, as stated
previously, this is the central piece of any study. In Tremblay’s experiment, the
items were four orthographic words displayed in the four quadrants of the screen
FIGURE 5.8 Trial sequence. Participants listened to sentences such as Kim regarde le

[adjective + noun], “Kim is looking at the [adjective + noun]”, and were
instructed to click on the corresponding (written) noun on the screen.
(Source: Tremblay, 2011).
(“the participants saw four orthographic words in a [non-displayed] 2 × 2 grid”).

Participants saw the four words for 4s before the critical phase of the trial began
(see Section 6.3.1.4 for details on item preview). In the critical phase, partici-
pants saw the four words again and simultaneously listened to a spoken sentence.
This is when the participants’ eye movements were recorded. The participants
were instructed to click on the target word they heard in the sentence. Thus, in
addition to the information obtained from the eye-movement data, there were
two more dependent variables: accuracy rates and reaction times. Figure 5.8 sum-
marizes the trial sequence in Tremblay’s (2011) study.
Looking ahead to the data analysis, there are three types of trials represented
in Tremblay’s study: practice trials, critical trials, and filler trials. Practice trials
are trials at the beginning of a study, usually between four and ten, which serve as
a warm up for the participant. Practice trials give participants an opportunity to
become familiar with the apparatus, the recording environment, and the proce-
dure. Participants have a chance to practice their task and can ask questions if they
have any, so that they are fully prepared for when the main experiment begins.
Godfroid and Spino (2015) emphasized the importance of having practice trials
in an eye-tracking experiment to improve data quality. In this regard, it is good to
remember that eye-tracking data are a type of reaction time measure and hence
the data will be sensitive to any hesitation phenomena that come from a lack of
familiarity with the task.
Critical or experimental trials are the trials in which a researcher is most
interested. These are the trials that contain the researcher’s experimental manipu-
lation and for which eye-movement data and any other dependent variables are
collected. Consequently, only the critical trials will be included in the statistical
analyses. In Tremblay’s (2011) experiment, the critical trials were the trials with
liaison (realized as /z/ in the study) and the non-liaison trials that shared the same
/z/ consonant between the adjective and the noun.
In many studies, critical trials will be interspersed with a third type of trial,
known as filler trials. Filler trials do not contain the experimental manipulation
of interest. Like practice trials, the data for these trials are usually not analyzed.
If you are planning to run two similar experiments, say two sentence-process-
ing studies, you could consider lumping the critical items from the two studies
together so you need to collect data only once.The critical items from one exper-
iment (e.g., noun-adjective gender agreement) can then serve as fillers for the
other experiment (e.g., subject-verb number agreement), and vice versa. This will
save you time and resources as participants will only need to come to the lab once.
The purpose of filler trials is to conceal the experimental manipulation from
participants; that is, from the participant’s perspective filler trials look exactly like
experimental or critical trials, but in reality, they do not contain the critical phe-
nomenon of interest (e.g., no liaison- or /z/-initial nouns in Tremblay’s study).
There is some debate as to what should go in filler trials; for instance, whether,
in a study that includes ungrammatical sentences (violation paradigm), some of
the fillers should contain ungrammatical forms as well. Including ungrammatical

fillers will make these trials look more similar to critical trials; however, ungram-
matical fillers could themselves heighten participants’ awareness of grammatical
form if the ungrammaticalities are salient.What factors are appropriate for a given
study deserves careful consideration in the context of researchers’ own studies,
including their methodology and research questions, and the characteristics of
their study participants.
Now let’s take a look at trials and trial sequences in a purely text-based experi-
ment, such as Alsadoon and Heift’s (2015) study. Using a pretest, treatment, post-
test, and delayed posttest design, Alsadoon and Heift investigated the effects of
visual input enhancement on reducing vowel blindness (i.e., a difficulty to process
vowels) in L1 Arabic–L2 English speakers. Here is how they described the con-
tents of a trial:
The treatment phase consisted of 36 trials for both the experimental and
the control group. Each trial included three separate display screens. For the
experimental group, the first screen displayed a sentence in which the target
word and its vowel(s) were textually enhanced with three typographical
cues … . In contrast, for the control group the textual input enhancement
was absent, that is, the test item was not underlined and [it was] displayed in
the same font and color as the remaining words in the sentence. …
The second screen asked study participants to provide the meaning
of the target word in a multiple-choice exercise identical to that of the
pretest and the immediate and delayed posttests. This task was intended
not only to motivate our study participants to actually read the sentence,
but also to ascertain the learner’s knowledge of the meaning of the test
items. …
The third screen provided feedback to the study participants on their
choice of the correct meaning of the target word. Screens 2 and 3 were
identical for both the experimental and the control group.
(Alsadoon & Heift, 2015, p. 64)
Alsadoon and Heift’s trials consisted of a three-step sequence, referred to as Screen

1, Screen 2, and Screen 3 by the authors. In particular, each trial consisted of (i)
sentence reading, followed by (ii) a short meaning recognition test, and (iii) feed-
back. As a reader, I am interested to know in what step or steps the researchers
recorded their participants’ eye movements and what the purpose of each step
was. In the Analysis section of their paper, the authors explained they analyzed
four eye-movement measures for the target words in the sentence. This suggests
eye movements were recorded and analyzed for Step 1 (the first screen) only.
What, then, was the purpose of Steps 2 and 3 in the trials? As stated in the previ-
ous article excerpt, the meaning recognition test ensured participants knew the
meaning of the target words. The authors elaborated:
This was an important step in the study because, by checking the learner’s
knowledge of the meaning of each test item, we ensured that the problems
in the learners’ intake of English vowels can be attributed to a lack of ortho-
graphic vowel knowledge as opposed to not knowing the word meaning.
(Alsadoon & Heift, 2015, p. 64)
Step 3, then, was a logical sequel to Step 2, as it let participants know whether
their response was correct. Taken together, the different steps in Alsadoon and
Heift’s trials all played an integral part in demonstrating the beneficial effects
of input enhancement and, specifically, helped the researchers isolate the role of
input enhancement in learning word form and meaning.
The two studies discussed in this section show how, in different research strands,
items are embedded in larger sequences of events. These events make up the tri-
als in a study. Another way to consider trial contents (discussed next) is in terms
of primary and secondary tasks that make up the trial. In what types of tasks do
participants engage in an eye-tracking study? It turns out there are many options.
5.4 Primary Tasks and Secondary Tasks

What counts as a primary task and a secondary task is somewhat subjective, even
though it is a useful distinction to make. In this book, I adopt the participant’s per-
spective in distinguishing between the two task types. Primary tasks are defined
as “tasks” the participants are instructed to do and for which the eye-movement
data serve as the primary source for data analysis. By this definition, primary tasks
of text-based research include, but are not limited to, reading texts, watching clips,
or reading for test completion (see online supplementary materials, Tables S5.1 to
S5.6 for representative articles). In visual world studies, researchers do not com-
monly distinguish between primary and secondary tasks, although in many cases
study participants will engage in more than one task in a study. In fact, there are
two main versions of the visual world paradigm, which differ with regard to the
primary tasks: looking-while-listening and action-based experiments (see online
supplementary materials, Tables S5.7 to S5.12 for representative articles).
Along with primary tasks, secondary tasks add structure to each trial. Secondary
tasks are additional tasks participants perform, such as answering comprehension
questions, making grammaticality or plausibility judgments, and translation. Eye-
movement data may or may not be recorded during this part. Secondary tasks alter-
nate with the primary task. The idea is that participants read or otherwise engage
with an item, solve a secondary question, read or engage with a new item, solve a
new secondary question, and so on. This definition of secondary tasks focuses on
trial structure and therefore, it does not include post-tests. Post-tests can be a useful
design feature to add to many L2 projects; however, participants will complete them
after all the trials are done. Therefore, post-tests are not a part of the trial, but they
fit into the experiment at a different (a higher) level.
Secondary tasks provide a measure of participants’ attentiveness during the

experiment and, in particular, during the primary task. Accuracy data collected
for a secondary task reflect participants’ levels of engagement with the materi-
als. They may also serve as a gauge of overall L2 proficiency level (that is, it is
expected that participants will do well on the secondary task). For this reason,
secondary-task scores are sometimes used as an inclusion criterion, to determine
which participants’ data will be included in the main analysis. A 70% or 80%
inclusion threshold is often found. Secondary tasks can also provide a cover story,
because the task can conceal a study’s true purpose. For instance, participants
may be asked to answer comprehension questions after each sentence to induce a
focus on meaning (i.e., encourage participants to read for comprehension). Giving
participants a purpose can help divert their attention away from what the experi-
ment is actually about and let the researcher observe what she is really interested
in in a pure form. This is important because once participants become aware of
the experimental manipulation, they may change their behavior in strategic ways,
for instance by scanning text for relevant information or anticipating what will
come next in a visual world experiment. In this book, I classified secondary tasks
as tasks that show the previously described characteristics (check on performance,
cover story) as well as those tasks of which the outcome is not used to address
research questions (see Tables S5.7 to S5.12 in online supplementary materials for
representative articles).
Let us now take a closer look at some tasks that are more specific to eye-
tracking methodology. Because primary tasks for text-based research follow
directly from the type of study being conducted (see Chapter 3), I will focus on
the visual world paradigm here (see Chapter 4). Visual world researchers have a
bit more freedom in what task they choose for their studies. The two most com-
mon options, which will be discussed in what follows, are looking-while-listening
and action-based experiments. After that, secondary tasks will be up next. Of the
great variety in secondary tasks, we will examine four: comprehension questions,
grammaticality judgments, translation, and plausibility judgment tasks. Readers
are referred to Tables S5.1–5S.12 in the online supplementary materials for addi-
tional examples of task types.
Primary tasks during listening: Looking-while-listening and clicking.
Looking-while-listening is a version of the visual world paradigm in which
learners are given no instructions other than to listen carefully to the spoken
input. The looking-while-listening task goes back to Cooper’s (1974) original
visual world experiment, in which participants listened to stories while they saw
nine objects arranged in a grid form on the screen (see Section 4.1). Compared
to action-based visual-world tasks, which will be discussed next, looking-while-
listening tasks are ecologically valid in that a participant’s eye gaze can wander
freely across the screen (Dussias,Valdés Kroff, Guzzardo Tamargo, & Gerfen, 2013).
There is no externally imposed need to make a conscious decision. Consequently,
the looking-while-listening task offers a more general test of what kinds of effects
(e.g., anticipatory effects, effects of incremental processing) occur when language

and vision interact naturally, outside the context of a specific task (Huettig et
al., 2011). This is not to say that action-based experiments completely lack in
ecological validity, because vision and action do tend to occur in tandem in the
real world (e.g., Altmann, 2011b; Henderson & Ferreira, 2004; Salverda, Brown,
& Tanenhaus, 2011). It appears visual world researchers assume that the two task
types (i.e., action-based and looking-while-listening) will yield similar findings,
even though, to my knowledge, only one study by Sussman (2006) has tested this
empirically. The flip side of ecological validity is that looking-while-listening will
produce greater variability in the data, given that individuals differ in how they
inspect scenes (Altmann, 2011b; Dussias et al., 2013). Therefore, looking-while-
listening experiments will generally require more trials and/or more participants
to generate sufficient data for a well-powered statistical analysis (Altmann, 2011b).
In a classic looking-while-listening experiment, Altmann and Kamide (2007)
gave listeners the following general description of their task: “We are interested
in what happens when people look at pictures while listening to sentences that
describe something” (p. 506).The participants then saw a visual display of an agent
(e.g., a man or a woman) and various objects (i.e., an empty wine glass, a full beer
glass, cheese, and two pieces of candy): see Figure 4.5. After 1000 ms, the partici-
pants heard a spoken sentence, such as “The man will drink the beer” or “The man has
drunk the wine”. The participants were free to look anywhere on the screen; they
pressed a button only to proceed to the next trial. Of interest was whether they
would look at the object that stood in the right temporal relationship to the event,
for instance the full beer glass for future-tense will drink and the empty wine glass
for past-tense has drunk (see Section 4.1).
Another primary task often used in the visual world paradigm is clicking,
or a variant that includes clicking and dragging. The use of these computer-
ized action-based tasks likely goes back to the early days of visual world research,
when participants were often asked to manipulate real objects on a table (e.g.,
Chambers, Tanenhaus, Eberhard, Filip, & Carlson, 2002; Marian & Spivey, 2003a,
2003b; Spivey & Marian, 1999;Tanenhaus, Spivey-Knowlton, Eberhard, & Sedivy,
1995;Trueswell, Sekerina, Hill, & Logrip, 1999).The same requirement to manip-
ulate objects physically also characterizes more recent work by Sekerina and
Trueswell (2011) with Russian-English heritage speakers (see Section 4.2.2.4).
A nice feature of eye-tracking studies with real objects is that the experimental
set-up highlights the link between vision and action. However, because it takes
time to rearrange displays in between each trial manually, many researchers now
use computer experiments as a faster alternative. As a result, real objects have
been replaced with images (see Section 6.3.1) and picking things up has turned
into clicking and/or dragging the depicted objects with a mouse. There are many
examples of image displays in Chapters 4, 5, and 6 of this book. Tremblay’s (2011)
study, which was described in Section 5.3, further showed that even printed word
forms can fulfill the role of images if orthography is an important part of the study.
Compared to looking-while-listening experiments, action-based tasks gener-

ate two more dependent variables derived from the mouse clicks, namely accu-
racy rates and latencies (reaction times). In many cases researchers will analyze and
report the results for these measures separately. Even so, it is worth emphasizing
that the output from the eye tracker (e.g., eye fixations and latencies) recorded
before participants made a click are the primary dependent variables for analy-
sis. From this perspective, action-based tasks and looking-while-listening share a
great deal of similarities.
Secondary tasks: Comprehension questions, grammaticality judg-
ment tests (GJTs), translations, and plausibility judgments. The most
common secondary task in eye-tracking research is comprehension questions.
Other secondary tasks include GJTs, plausibility judgments, and translation. What
the participant needs to do in each of these tasks is rather straightforward. When
asked to respond to a comprehension question, learners are tested on the con-
tent of what they just read or listened to. For instance, after exposure (i.e., reading
or listening) to the sentence The father brought some wood (*woods) home to stoke up
the fire, the participant may be asked whether or not the father has a fireplace at
home. In this case, an affirmative answer (i.e., a YES response) is expected. The
participant will often signal her response by pressing a button on a game control-
ler, a response box, or a keyboard.With the same item, the participant can be asked
to make a grammaticality judgment, that is to judge whether the sentence
is grammatical or not. In this instance, the use of some woods, as opposed to some
wood, should elicit a NO response, because wood is a mass noun. In a translation
task, the participant is expected to translate the sentence into another language,
oftentimes their L1. Finally, when making a plausibility judgment, participants
indicate whether or not the sentence makes sense. Based on world knowledge,
using wood to stoke up a fire is plausible (i.e., sensible), hence a YES response is
expected, regardless of whether the sentence is grammatically well formed or not.
In contrast, a sentence like The father brought some water (*waters) to stoke up the fire
is considered implausible, and therefore should elicit a NO response.
By far the most commonly used secondary task is a reading or listening com-
prehension measure. Comprehension questions have been used across most
strands of eye-tracking research (see online supplementary materials, Tables S5.1
to S5.12).They come in different formats, depending on the purpose of the com-
prehension measure. Perhaps the most exhaustive comprehension measure is the
retelling task. In this task, participants are asked to retell what they just read or
listened to. In order to compare participants’ retellings, researchers often develop a
general assessment rubric of idea units. Idea units are the main points and support-
ing details of a text; participants’ scores are determined according to how many of
these units they covered in their retellings. Idea unit scoring, for both main points
and details, will reveal the extent to which a participant has understood the text.
Story retelling has been used as a comprehension measure in input enhancement
research (Winke, 2013) and is also found outside eye-tracking research, as a data
elicitation method to measure learners’ L2 knowledge (Larsen-Freeman & Long,

1991). However, developing a coding scheme for idea-unit scoring takes time, and
two raters may be needed for interrater reliability.
Multiple choice comprehension questions and true/false statements are a less
labor-intensive alternative (easier to score) for testing participants’ comprehen-
sion of discrete items in the input. Even so, care is needed when constructing
these question types. To ensure validity of measurement, questions should not
be answerable based on real-world knowledge. Researchers could check this by
running their questions by a new group of participants, who will not participate
in the main study (for an example, see Godfroid & Spino, 2015). If participants’
world knowledge is of no use, the new participants should be guessing the answer
and their performance should be at (or below) chance level. Another concern is
that comprehension questions and answer choices should not repeat the target
structure if there is any. Consider the example *The father brought some woods
home to stoke up the fire, seen previously. Asking if the father brought wood would
not be good because it would increase participants’ exposure to the target noun
wood (*woods). As the experiment goes on, researchers may find their participants
become more and more sensitive to incorrectly used mass nouns (for empiri-
cal evidence of task-induced effects, see Havik, Roberts, van Hout, Schreuder,
& Haverkort, 2009). A better alternative, then, is to do a simple comprehension
check by asking whether the father has a fireplace or whether the father is making
a fire. Provided that these constraints on question design are met, discrete-item
questions can offer a valid check on comprehension that is relatively quick and
easy to perform. Comprehension questions, then, owe their popularity in part to
their ease of implementation. Depending on the research questions, researchers
may decide to analyze eye-tracking data only for those trials for which partici-
pants answered the comprehension question correctly. Textbox 5.4 summarizes
the key points regarding the construction of comprehension questions.
TEXTBOX 5.4. HOW TO CONSTRUCT

COMPREHENSION QUESTIONS
1. Make sure comprehension questions cannot be answered based on real-

world knowledge. Norm your questions with a separate group of partici-
pants to confirm the questions’ validity as a text comprehension measure.
2. Do not repeat target words or structures from the experiment in the
comprehension questions or answer options.
3. Consider the purpose of your questions. If the goal is just to ensure
participants are reading or listening, questions can be quite simple, but
if deeper text comprehension is needed, a retelling task may be more
appropriate.
When the goal is to measure text comprehension, other meaning-based tasks

such as plausibility judgments (Dussias et al., 2013) and translation (Lim &
Christianson, 2015) are also available. Plausibility judgment works best when
the stimuli are sentences. Participants indicate after each sentence whether or
not an event is likely to occur in the real world. Therefore, compared to com-
prehension questions, which are formulated separately, plausibility judgments
do not require presenting participants with additional input that could sensitize
them to the experimental manipulation. Different from comprehension ques-
tions, plausibility judgment relies on a participant’s general knowledge of the
world. It is worth bearing in mind that something you consider “general knowl-
edge” might actually be culturally specific information. To avoid any ambigu-
ity, it is best to create sentences that are very clearly implausible. For instance,
sentence (1) is not a good example of an implausible sentence because in some
parts of the world, taking time off on the weekend is not as common. For opti-
mal results, any ambiguity should be removed from your sentences. Some ways
to do this are by situating events on other planets or including talking objects,
as in example (2). To verify your plausibility intuitions with other people, you
could run a small pilot, using a similar group of participants as you plan to
recruit for the main experiment.
(1) Most people work seven days a week.

(2) Joanna emailed the document to a tomato.
In many cases, the implausible sentences will serve as filler trials (see Section 5.3)
because processing may change when participants encounter something unex-
pected. Thus, researchers may report overall accuracy rates for the plausibility
judgments, but focus only on the plausible sentences in their analysis of eye-
tracking data.
The final measure, translation, is slightly different from comprehension ques-
tions and plausibility judgments because participants need to produce language
on top of comprehending it. What translation measures is a bit controversial. Lim
and Christianson (2015) argued translation is a comprehension measure, which,
compared to comprehension questions, invites a deeper level of processing (also
see Lim & Christianson, 2013). In their study, Korean learners of English were
asked to read and then translate English sentences into Korean in each trial. The
participants were more sensitive to grammatical violations in this translation
experiment than in an experiment that included comprehension questions after
each sentence (also see Jackson & Bobb, 2009; Jackson & Dussias, 2009; Leeser,
Brandl, & Weissglass, 2011 for similar comparisons of grammaticality judgments
and comprehension questions). Although Lim and Christianson (2015) promoted
the use of translation as a comprehension measure, the authors acknowledged that
translation “draws the attention of even lower proficiency L2 learners to mor-
phosyntax” (p. 1288) and “the translation itself contains explicit evidence of how
morphosyntax has been understood” (ibid.). Therefore, even if translation meas-

ures comprehension, the task invites a focus on meaning and form (Ellis, 2005).
Therefore, translation could potentially carry a risk of making participants aware
of the targeted linguistic structure. Another potential limitation of translation is its
explicit involvement of the L1. Whether or not this involvement could have an
influence of the processing of the target language needs to be investigated. In sum,
all three tasks have in common that they divert the participant’s attention to the
meaning of stimuli. Furthermore, comprehension measures and plausibility judg-
ments make it less likely participants will focus on language form and whether or
not it is grammatical.
Other than meaning-based secondary tasks, there is a long tradition of GJTs
in linguistics and SLA research, including research with eye tracking (for gen-
eral reviews, see Plonsky, Marsden, Crowther, Gass, & Spinner, 2019; Spinner &
Gass, 2019). GJTs, as implemented in eye tracking, are binary-choice tests that
require participants to indicate whether a given sentence is grammatically correct
(“good”) or not (“bad”). Because the task explicitly focuses participants’ attention
on grammatical form, GJTs do not provide a cover story. The task is useful to tap
into participants’ explicit knowledge of grammar, especially when participants are
under no time pressure to respond.
In L2 eye-tracking research, Godfroid et al. (2015) investigated how pro-
cessing changes when GJTs are performed under time pressure. They found L2
speakers, but not L1 speakers, made fewer regressions (right-to-left eye move-
ments, usually followed by rereading) when doing the timed test. This pattern
of findings suggested adding time pressure can change processing behavior and
potentially the type of linguistic knowledge being measured (more automa-
tized explicit knowledge or more implicit knowledge in the timed GJT). In
Godfroid et al.’s study, the GJTs were the main focus of investigation (i.e., not
a secondary task). Using self-paced reading, Leeser et al. (2011) showed that
as a secondary task, grammaticality judgments may change the way L2 speak-
ers process certain structures. Intermediate-level Spanish learners were sensi-
tive to agreement violations when reading and judging the grammaticality
of sentences but not when reading and answering comprehension questions.
Task type did not affect processing of another structure, subject-verb inver-
sion. Effects of task type also emerged from the comparison of findings from
Jackson and Dussias (2009) and Jackson and Bobb (2009), both with self-
paced reading methodology. Advanced L2 German speakers who answered
verification statements in Jackson and Bobb (2009) built shallower syntactic
representations than a similar group of L2 German speakers who made gram-
maticality judgments in Jackson and Dussias (2009). In this regard, it is worth
repeating one of the strengths of eye-tracking methodology: eye tracking can
measure linguistic knowledge under relatively implicit conditions, during real
time, with a minimum amount of conscious reflection, while participants are
attending to the meaning of a sentence. GJTs may undo some of these benefits
normally found in eye-tracking experiments. Therefore, unless the goal is to
study GJT-induced task effects (see Godfroid et al., 2015; Leeser et al., 2011),
other, more meaning-focused tasks may be more appropriate as secondary
tasks. Table 5.1 summarizes the main characteristics of the different secondary
tasks reviewed in this section.
As seen in Table 5.1, secondary tasks can fulfill different purposes in a study. The
primary purpose of the task will inform various methodological decisions, such as
how many trials need to have a secondary task after the primary task. Coverage
refers to the percentage of trials that include a secondary task. You may have assumed
primary and secondary tasks had to go together all the time; however, a 100% cover-
age is only necessary if you are planning to do a detailed analysis of the secondary-
task data. Other purposes can be accomplished with lower coverage rates. If the goal
is to use the secondary-task scores as an inclusion criterion and exclude participants
who are performing poorly, it may be enough to insert secondary task items after
50% of all trials.To keep participants alert and engaged, a 25% to 30% coverage rate
may be enough. These figures are intended as guidelines. Contextual factors such
as lab availability and your participants’ profile will further shape what is possible
in your study. As a rule, however, you will want to add a secondary task after more
trials if the secondary-task data will help you answer your research questions or are
important to what you are studying.
A further question is whether, researchers want to keep trials for which par-
ticipants answered the secondary task incorrectly. This question may arise when
researchers want to study meaning-focused processing. An incorrect response
to a comprehension question or plausibility judgment may signal a temporary
lapse in the participant’s focus on meaning. On this account, it may be safer
to remove these trials from the analysis. This is reasonable but strict. Another,
more liberal approach is to set an overall inclusion threshold (e.g., 70% or 80%
response accuracy on the whole experiment) and include all the data from par-
ticipants who meet that criterion. In this manner, both correct and incorrect
responses will be included for eligible participants. Researchers are likely to lose
less data this way, which can be a concern, especially when working with L2
speakers. The key here is to have a clear understanding of the purpose of your
secondary task.
In sum, secondary tasks present researchers with a range of options to enrich
their studies and collect additional information about participants’ performance.
Although these tasks are secondary by nature, their implementation in a study
requires careful thought. Piloting will be helpful to confirm the robustness of your
task stimuli. Piloting can also help you make some of the finer methodological
decisions for which standard guidelines are still being developed. Tables S5.1–5.
S12 in the online supplementary materials present primary and secondary-task
information for the different types of eye-tracking studies.
TABLE 5.1 Comparison of four secondary tasks
Comprehension questions Plausibility judgments Translation Grammaticality judgments

What does it Reading or listening Sentence comprehension in Comprehension of the source Grammatical knowledge
measure? comprehension relation to the real world language and encoding into the
target language
Primary focus Meaning Meaning Meaning and form Form
Can it serve as a Yes Yes No No
cover story?
Extra input Yes No No No
Extra output No No Yes No
Real-world Discouraged Encouraged Yes, potentially a small role No
knowledge
Can participants Yes Yes Yes, to a certain degree Yes
guess the answer?
Major Strength Easy to administer No extra input Measures morphosyntactic Measures grammatical
comprehension as well as knowledge
meaning comprehension
5.5 How Many Items Do I Need?

Now that we have all the basic building blocks for an experiment in place, we
turn to the question of numbers. How many items do I need? This seemingly
simple question will carry us into the practicalities and statistical practices of SLA
and bilingualism. As it turns out, there are many factors to consider.
The question of how many items to include brings up an inherent tension
in experimental research. On the one hand, researchers want to maximize the
information they can get from their participants, yet on the other hand they are
limited by practical constraints. Many of these practical constraints are related to
time—lab time, participant time, and your own time. As Schmitt (2010) wrote
in a vocabulary research manual, “Time is precious. You never have enough” (p.
164). As an eye-tracking researcher and a vocabulary researcher, I concur. Even
motivated participants will grow tired after about 1.5 hours of data collection and
so, as a researcher, you want to think about how to make the most of participants’
time in your lab. Of course, it is possible to run multiple sessions with the same
participants, but this will require careful planning and some incentives for the
participants to return beyond day 1 of the experiment.
These practical considerations notwithstanding, having a large enough number
of items in a study matters for reliability. The idea is that as a researcher, you want
to measure a participant’s behavior in a certain condition and you want to do so
consistently. This means you need multiple items that all elicit the same behavior.
Consistency of measurement is reflected in instrument reliability, a measure
(e.g., Cronbach’s alpha, split-half reliability, Kuder Richardson [KR] 20, KR-21)
you can calculate once you have collected your data. Instrument reliability is “the
consistency of scores on items seeking to measure the same construct or ability”
(Plonsky & Derrick, 2016, p. 538). For instance, in a study in which partici-
pants are asked to describe video clips depicting motion events (Flecken, Carroll,
Weimar, & Von Stutterheim, 2015), results will be more reliable if participants
consistently conceptualize videos of the same kind (e.g., a man walking toward
a car, a woman walking to the bus stop) in the same manner. Other things being
equal, the reliability of your instrument will increase as you add more items (more
video clips). Therefore, larger instruments, which have more items, also tend to
have greater reliability. In a larger instrument, the influence of any individual item
on the overall outcome will also be smaller, which is good if there are any poorly
performing items (i.e., items that show unexpected, undesirable, or confounding
effects on measurement) in the list.
Reporting instrument reliability is a good practice that helps improve research
transparency. Currently, there is room for methodological improvement in SLA,
as only 28% of the L2 studies examined in Plonsky and Derrick (2016) actu-
ally reported instrument reliability information. Second-language eye-tracking
researchers suffer from the same problem, because they almost never present reli-
ability estimates for their primary tasks. One possibility, borrowed from general
reaction time research, would be to calculate split-half reliability for the items
in each of the different experimental conditions, treating eye fixation times as a
special type of reaction times. The statistical software R has a package, splithalf,
that will let you do just that (https://cran.r-project.org/web/packages/splithalf/
index.html). If reliability becomes more mainstream in eye-tracking research, not
just as a theoretical construct but as something researchers calculate and report, it
will further help promote the notion that item numbers are important.
Item numbers also matter because larger item sets will increase the researcher’s
chances of detecting effects (i.e., significant group differences or significant rela-
tionships between variables) in the data analysis. The concept here is that of sta-
tistical power. Statistical power refers to the likelihood a researcher will uncover
true effects based on the data that were collected. The concept of statistical power
can be likened to the workings of a microscope. Microscopes come in different
strengths. Which microscope is appropriate for your project will depend on what
you are trying to study. Flower pollen can be studied under a low-power micro-
scope, but seeing bacteria requires a better, more powerful device. In research, the
size of the phenomenon you seek to study is reflected in the effect size. Effect
sizes are traditionally categorized as small, medium, or large. To detect smaller
effects in a statistical analysis you need more statistical power, much like seeing
bacteria requires a more powerful microscope than seeing pollen.
Researchers can increase the statistical power of their analyses by collecting
a larger data set. The size of a data set is the total number of observations. It is a
product of both the number of participants and the number of items. Researchers
should pick their item and participant numbers accordingly, and aim for high lev-
els of statistical power in their studies. In SLA and bilingualism, statistical power
is typically set at .80. This means there is an 80% chance researchers will detect
existing effects accurately (e.g., find significant differences if groups or treatments
truly differ). With the desired statistical power level, the anticipated effect size,
and the significance level, researchers can perform an a priori power analysis (for
details, see Larson-Hall, 2016). The outcome of this analysis will be an estimate of
how large a data set is necessary to run a well-powered study.
So what are typical effect sizes in different strands of SLA and bilingualism
research? It turns out they are larger than in neighboring disciplines. Plonsky and
Oswald (2014) compiled over 400 effect sizes from L2 research (both primary
studies and meta-analyses) and plotted the effect sizes’ distribution. Using the
distribution as a guide, the authors proposed field-specific cutoff values for what
counts as a small effect (25th percentile of the distribution), a medium effect (50th
percentile), and a large effect (75th percentile). In so doing, they revised, all cutoff
values for effect sizes upward compared to Cohen’s (1988) norms. For instance,
for mean differences between groups, Plonsky and Oswald proposed d = 0.60
(versus Cohen’s d = 0.20) for a small effect, d = 1.00 (versus Cohen’s d = 0.50)
for a medium effect, and d = 1.40 (versus Cohen’s d = 0.80) for a large effect.
As we will see in what follows, this has implications for sample-size calculations.
The authors attributed these differences in cutoff values, not just with Cohen’s
benchmarks, but with similar meta-syntheses in other fields as well (Hattie, 1992;
Lipsey & Wilson, 1993; Richard, Bond, & Stokes-Zoota, 2003; Tamim, Bernard,
Borokhovski, Abrami, & Schmid, 2011), to the relative youth of the field of L2
research, as well as a potential publication bias.
In the neighboring discipline of psychology, typical effect sizes are about d =
0.40 (Kühberger, Fritz, & Scherndl, 2014; Open Science Collaboration, 2015),
which are considered small effects. In such cases, Brysbaert and Stevens (2018) rec-
ommended that researchers conducting reaction time experiments aim for 1,600
observations per condition (e.g., 40 participants × 40 items; 20 participants × 80
items; or 80 participants × 20 items), in order to achieve a statistical power of .80.
As mentioned previously, effect sizes in L2 research tend to be larger. Therefore,
to bring back the microscope analogy, L2 researchers will not need quite as many
observations to achieve the same statistical power.To determine how many obser-
vations are needed, researchers could run simulations based on pilot data for their
study. Brysbaert and Stevens (2018) demonstrated this procedure, using the simr
package developed for R (Green & MacLeod, 2016; Green, MacLeod, & Alday,
2016). The idea is to draw a number of random samples from an existing data set
and run the desired statistical analysis a number of times on each sample. For accu-
rate power estimates, it is better to run the simulations on one’s own data (these
could be real data or simulated data) or a dataset from a similar study (Brysbaert
& Stevens, 2018). Using your own data is better because variance in eye fixation
times is likely to be task- and population-specific.To calculate the statistical power
of a research design, simply count how often the simulations return as statistically
significant. The statistical power of a study is the proportion of all tests that are
significant. For instance, if you run a total of 2,000 statistical tests on your data set
and 1,200 tests are statistically significant, the estimated power of the design is .60.
Once researchers have an initial estimate of the power of their test, they can
modify different parameters, such as the participant number, item number, and the
observed difference between conditions. To increase sample size, which is neces-
sary if the initial simulations indicated a lack of power, researchers could simply
copy their data set. This may not yield exact results, because no two sets of data
will ever be identical, but it will fit most researchers’ needs. The target number
of items and participants, then, is the size of data set (participant number × item
number) that will return positive test results on 80% of all test simulations.
The previous discussion emphasized the importance of both items and par-
ticipants for statistical power.4 Table 5.2 lists the number of items and participants
per condition I distilled from the synthetic review of text-based eye-tracking
studies (see Tables S5.1 to S5.6 online for detailed information). Table 5.3 does
the same for visual world studies (based on Tables S5.7 to S5.12 online). These
numbers provide an indication of the number of observations in contemporary
L2 eye-tracking research. In many strands, median cell size (n participants × k
items per condition) is close to 300, for instance 260 observations in grammar
TABLE 5.2 Number of observations in L2 eye-tracking research with text (2007–2017)
Research strands Number of items per condition Number of participants per conditionb
Mean Median Min–Max Mean Median Min–Max
Grammar 11.99 10 4–42 28.42 26 14–60
Vocabulary 33.70 14.25 4–225 26.24 26 15–42
ISLA 19.52 19 3–36 21.63 15.5 3–66
Subtitlesa 184 254 18–280 25 25.5 9–40
Assessment 11.25 9 3–24 28.25 30.5 14–38
a
For subtitles research, the relevant unit may be time, rather than the number of subtitles shown.
Average clip length: 10 min; Median length: 5 min; Range: 4–25 min.
b
In a within-subjects design, the number of participants per condition will be the total sample size.
In a between-subjects design, it will be the total sample size, divided by the number of conditions.
TABLE 5.3 Number of observations in L2 visual-world eye-tracking research (2003–2017)
Research strands Number of items per condition Number of participants per conditionb
Mean Median Min–Max Mean Median Min–Max
Word recognition 21 20 5–48 41 39 14–70
Prediction 11 9 3–28 40 35 16–100
Referential 11 8 6–18 45 36 34–66
processing
Productiona 14 10 7–22 24 20 15–48
a
Number of items in some studies was based on the amount of feedback given in response to the
participant’s production.
b
In a within-subjects design, the number of participants per condition will be the total sample size.
In a between-subjects design, it will be the total sample size, divided by the number of conditions.
research, 295 observations in ISLA, and 315 observations in visual-world, predic-

tion research. This corresponds to 20 participants seeing 15 items per condition
or 15 participants seeing 20 items per condition. This cell size, however, does not
by itself guarantee adequate statistical power. If anything, 300 observations is well
below the 1,800 observations Brysbaert and Stevens (2018) recommended for
reaction-time research conducted with native speakers. To estimate the statistical
power of their research designs, researchers will need to consider additional fac-
tors, most notably the effect size or mean difference between conditions, whether
their study has a within- or a between-subjects design (see Section 5.2), and what
type of analysis will be performed.
Once researchers have settled on a number of items per condition, giving due
consideration to all other design factors, they are ready to start item production.
Most experiments have more than one condition. Thus, the researcher’s task will
be to create a collection of items with which to populate the whole experiment.
It is a good idea to create items and item versions first (see Section 5.1) before
you start thinking about item lists. As we saw in Section 5.2, list assignment will
look somewhat different for studies with a between- or a within-subjects design
and for studies with or without counterbalancing.Thus, item production is where
all the different topics we covered in this chapter come together. This is where
the rubber hits the road and you finally have a chance to apply everything you
learned in this chapter.
To illustrate, let’s take another look at Montero Perez et al. (2015), a captions
study with L2 French university students. Recall from Section 5.2 that Montero
Perez and her colleagues wanted to compare the effects of full captioning and
keyword captions on vocabulary acquisition. Thirty-four participants (from an
initial sample of 51) were randomly assigned to watch two video clips in one of
the two captioning conditions, using a between-subjects design. Participants were
evenly split between the two conditions. Across the two video clips, they encoun-
tered a total of 18 target words (i.e., items). Thus, the number of observations per
cell (i.e., per captioning condition) was 17 participants × 18 items = 256.
Now imagine that the study had a within-subjects design. In this case, all 34
participants would experience both keyword captioning and full captions, but due
to the counterbalancing of the video clips, each participant would see only half
the target words in either captioning type, specifically 11 targets in the LEGO©
video (e.g., with full captions) and 7 targets in the brewery video (e.g., with key-
word captions). Figure 5.6, which is reproduced as Figure 5.9 here, visualizes the
FIGURE 5.9
Between-subjects design and within-subjects design for studying
the role of captions in vocabulary acquisition. In a counterbalanced,
within-subjects design with two conditions, there will be twice as many
participants but half the number of items per condition.
(Source: Based on Montero Perez et al., 2015).
difference between the two designs. This example shows how counterbalanced,
within-subjects designs make larger demands on materials creation (because par-
ticipants see fewer items per condition), whereas between-subjects designs require
larger participant numbers (because there are fewer participants per condition).
Although it is probably not a good idea to have fewer than ten participants
or ten items in a given condition, the practicalities of a given research context
may push researchers toward a between- or a within-subjects design. Specifically,
some of the limitations of a small participant sample can be offset by including
more items and running the study as a within-subjects experiment. Conversely, if
length of experiment is a concern (e.g., when working with children), research-
ers could include fewer items but recruit more participants. At the end of the day,
when cell sizes are equal, a within-subjects design will be more powerful than a
between-subjects design, because every participant will serve as his or her own
control (see Section 5.2).
To sum up, the median cell size in contemporary L2 eye-tracking research
is close to 300, which corresponds to 15 participants × 20 items or 20 partici-
pants × 15 items per condition. These numbers do not by themselves, however,
guarantee that a study will have adequate statistical power, because power depends
on many other elements as well. When deciding what sample size you need for
your study, the key factors to consider are practicality, instrument reliability, typi-
cal effect sizes in a given subdiscipline, ease of recruiting participants, and type of
research design.
5.6 Conclusion
In this chapter I covered basic principles of experimental design with a focus on
the overall structure of a study. Eye-tracking researchers follow the same principles
of experimental design as other quantitative researchers, but add to that some new
concepts and rules or constraints that are specific to eye-tracking methodology
(see Chapter 6). From this perspective, fundamental knowledge of general design
principles is essential for creating a good eye-tracking study. Of many factors, I
selected five key elements deemed critical for a sound research study. The first
concepts were those of items and item versions (see Section 5.1). The different
versions of an item mirror the different levels of your independent variable(s) and
are thus a direct expression of the design of your study. When designing my own
research projects or advising students, I like to draw the different experimental
conditions on a piece of paper, similarly to the diagrams shown in Section 5.1.
I find this helpful to link the statistical and experimental aspects of the research.
With the concept of items in place, researchers need to consider whether to
implement their study as a between- or a within-subjects design (see Section 5.2).
Many studies will lend themselves to either design type. In that case, a within-
subjects design may be preferred, because it yields less noisy data (every partici-
pant serves as his or her own control) and hence, a more powerful research design.
The issue of statistical power came up again, in Section 5.5, as a guiding principle
for determining an adequate sample size.The smaller the effect you want to study,
the more participants and items you will need to run well-powered statistical
tests. While the question of sample size defies a simple answer, Tables 5.2 and 5.3
summarize the item and participant numbers that are commonly found in con-
temporary L2 eye-tracking research. Lastly, items are the nucleus of a larger unit,
known as a trial (see Section 5.3). Trials are composed of primary and secondary
tasks (see Section 5.4) and sometimes eye-tracking data will only be recorded
during the primary task. Even so, secondary tasks can provide a wealth of infor-
mation about participants’ attentiveness, their general L2 proficiency, and ability
to complete the task, while also giving the participants a purpose (real or pretend)
for doing the experiment.
With these concepts fully established, the time has come for some eye-tracking-
specific guidelines. In Chapter 6, we consider what makes eye-tracking method-
ology different from other behavioral research methods.
Notes
1 Critical or experimental trials are trials that will be included in the data analysis, see
Section 5.3.
2 This is true even for within-subjects designs, because items tend to be counterbalanced
(more on counterbalancing later in this section).
3 Creating lists is not necessary when researchers think participants can safely see the
same item more than once. In that case, the researchers can present all versions of all
items together in a within-subjects design with no counterbalancing (e.g., Godfroid et
al., 2015; Kaushanskaya & Marian, 2007). This is less common in L2 and bilingualism
eye-tracking research (because in many cases repetition is better avoided), but from a
design standpoint, it is the simpler thing to do.
4 This is true for linear mixed effects models, which were the focus of Brysbaert and
Stevens’ (2018) article, but it is also true for analyses that involve some type of data
averaging, such as t tests, ANOVA, and linear regression. When data are averaged, more
observations will give rise to more precise mean values (i.e., means with a smaller
standard deviation) and hence larger effect sizes (Brysbaert & Stevens, 2018).
6
DESIGNING AN EYE-
TRACKING STUDY
The aim of this chapter is to prepare readers for their own eye-tracking stud-
ies by equipping them with the methodological know-how and skills to design
their own research. Building on general experimental guidelines (see Chapter 5),
I provide an overview of methodological considerations specific to eye-tracking
research. At the top of the list is the need to define interest areas (see Section
6.1), a key element that pertains to text-based eye tracking and the visual world
paradigm alike. The remaining parts are devoted to providing paradigm-specific
guidelines; that is, guidelines for text-based eye tracking and for visual world
eye tracking, which extend to other multimodal settings. Section 6.2 deals with
spatial, artistic, and linguistic factors in text-based design. Section 6.3 offers an
in-depth discussion of how to create auditory and visual materials for a visual
world experiment. Researchers working on multimedia learning may find the
information in Section 6.3 helpful too. Readers can draw from Section 6.2 and
Section 6.3 as the primary sources of information for their studies in these respec-
tive paradigms. However, some principles (e.g., screen layout) apply more broadly
and so a general understanding of how eye-tracking researchers do things in other
areas of the field can benefit your own research as well.
6.1 Defining Areas of Interest

Potentially the most important concept in eye-tracking research is that of an
interest area, also termed area of interest or region of interest. Interest areas
are central to almost any eye-movement study. They form the link between the
design of a study, which is grounded in the research questions, and the data analy-
sis, which will yield the answers to the research questions. Thus, interest areas are
central to the entire cycle of eye-tracking research, from study conceptualization
Designing an Eye-Tracking Study 159
to data analysis and presentation of findings. In eye tracking, the concept of an

interest area refers to a spatial region, defined by the researcher, for which the eye-
movement data will be extracted from the eye-movement recording for further
analysis.To put it differently, the eye tracker records a participant’s eye movements
for the entire area for which the camera has been calibrated (e.g., a computer
screen or a two-dimensional plane in space), but only a subset of all the recorded
data will be analyzed. Interest areas are what researchers use to delimit this area.
As a general rule, interest areas should be set (i.e., drawn manually or applied
automatically by the software) with a view to getting the data that will allow one
to answer the research questions (see Section 6.1.4 for further details).This is why
it is important that researchers start thinking about their interest areas early on in
the study design, when they are developing their materials and research questions.
With few exceptions, it is possible to set interest areas before collecting the data.1
If you can tell beforehand what your participants will see in your study, I highly
recommend that you apply interest areas before you start running your experi-
ment. A common mistake is to postpone the definition of interest areas until after
data collection is complete. Doing so may save a bit of work on the front end, but
it will almost always result in substantial amounts of extra work later and could
lead to the unenviable task of coding the data manually on a trial-by-trial basis.
By defining interest areas at the outset of the study, before the data are collected,
researchers are invited to think deeply about how they will go about answering
their research questions. In practice, this will often lead to a better, more compel-
ling research study.
So how do we determine what interest areas to use for a particular study? The
answer to this will depend on what strand your study falls under (see Chapters
3 and 4). Broadly speaking, interest areas in SLA and bilingualism research fall
into one of three categories: (1) word-based interest areas, (2) larger areas of text,
and (3) images. A fourth category, (4) moving interest areas, have been used in
production research (Flecken, Carroll,Weimar, & Von Stutterheim, 2015) and syn-
chronous chat studies (Michel & Smith, 2017, 2019). Moving interest areas will
be introduced in Section 6.1.3.2 and again in research idea #10 in Section 9.3.1,
and in Section 9.3.2.3.
6.1.1 Word-Based Interest Areas

Word-based interest areas are found across a range of research strands, including
the bilingual lexicon (De León Rodríguez et al., 2016; Miwa, Dijkstra, Bolger,
& Baayen, 2014) and vocabulary acquisition (Godfroid, Boers, & Housen, 2013;
Godfroid et al., 2018; Montero Perez, Peters, & Desmet, 2015), grammar acquisi-
tion (Godfroid & Uggen, 2013; Sagarra & Ellis, 2013), and assessment (McCray &
Brunfaut, 2018). Intuitively, if a study were to examine the relationship between
eye movements and vocabulary acquisition, the primary interest areas would
be the target word(s) for learning in the study (see Figure 6.1). This is because
160 Designing an Eye-Tracking Study
FIGURE 6.1
Interest areas in a vocabulary learning study. The authors analyzed
fixation times on the novel (pseudo) word perchants as well as its
apposition offspring. To study further context effects, a researcher could
designate additional interest areas encompassing the whole pseudo
word–English word pair.
(Source: Godfroid et al., 2013).
researchers are primarily interested in how the eye-movement measures for the
particular word relate to acquisition; they may choose to disregard the process-
ing of other parts of the text in their analysis. Now imagine the same research-
ers wanted to investigate the role of context in inferring word meanings. This
scenario would require more interest areas besides the target words to capture
the processing of the surrounding context. Additional interest areas could again
be word-based, like the area for the target word, or they could encompass larger
areas (e.g., everything up to the target word as a single interest area). Which route
researchers took would depend on what analysis they had in mind for their con-
text effects. They could also define interest areas both ways and then decide later,
in the analysis stage, which approach proved more informative.
Similar to vocabulary research, many sentence processing studies have word-
based interest areas. This time, the areas center around the critical grammatical
features, for instance the structures or forms whose acquisition researchers want
to study. Let’s consider Hopp and León Arriaga (2016) as an example. Recall from
Section 3.2.1 that Hopp and León Arriaga were interested in the processing of
case in Spanish. As a result, for sentences with ditransitive verbs, the primary inter-
est area was the indirect object and its article/case marker, which was either gram-
matical (al) or ungrammatical (el) in the study. In addition, the authors designated
the preceding verb, the ensuing direct object, and the sentence-final prepositional
phrase as separate interest areas (see Figure 6.2). By defining additional interest
areas in this manner, Hopp and León Arriaga were able to compare the process-
ing of grammatical and ungrammatical sentences more comprehensively and to
make a stronger case for their participants’ sensitivity (or lack of sensitivity) to case
marking violations.
In the preceding examples, word-based interest areas were defined for words
that occurred in sentences or longer stretches of text. While this is arguably the
most common scenario, other kinds of research designs may call for word-based
interest areas as well. These are studies that present words in isolation, for instance
in keyword captioning (Montero Perez et al., 2015) or in some psycholinguistic
research (De León Rodríguez et al., 2016; Miwa et al., 2014). Montero Perez et
al.’s study is a good example because it shows how flexibly word-based interest
areas can be used. Recall from Section 3.2.4 that these authors examined the
effects of different types of captioning—keyword captioning and full caption-
ing—on learning vocabulary.To do so, the authors overlaid the same interest areas
on the target words in both captioning conditions; that is, they used the same
interest areas regardless of whether the target word appeared in a full captioning
line or as a keyword caption (see Figure 6.3). By manipulating the layout in this
way, the authors were able to focus on the effects of visual salience in multimodal
vocabulary learning (with keyword captioning hypothesized to be the more sali-
ent technique). They did not have to analyze viewing behavior across the whole
screen, or even across the whole captioning line, but could focus on the targets for
learning in the video captions instead.
Finally, some psycholinguistic experiments focus on single-word processing,
using reading aloud (De León Rodríguez et al., 2016) or lexical decision tasks
(Miwa et al., 2014). In such cases, the word or nonword may be all there is for
a participant to look at on the screen. This may make the use of interest areas
redundant. Participants have no motivation to move their eyes away from the
words (Miwa, personal communication, October 9, 2017). Conceptually, though,
researchers still want to know for how long and where in the word participants
were looking (see Figure 6.4). These examples highlight that interest areas are
FIGURE 6.2 Interest areas in a grammatical-ungrammatical sentence pair.The primary

interest area is indicated with a double bold line; all secondary interest
areas are marked with a single bold line.
(Source: Hopp and León Arriaga, 2016).
FIGURE 6.3 Full captioning (left) and keyword captioning (right). The interest area
was drawn around the French target word figurines (“figurines”).
(Source: Montero Perez et al., 2015).
FIGURE 6.4 English lexical decision task with eye tracking. Circles represent fixations
inside four English words.
(Source: Miwa et al., 2014).
particularly useful when there are several pieces of information on the screen and
researchers want to analyze only a subset of all the available information.
6.1.2 Larger Areas of Text

Moving beyond the word level, some research designs call for interest areas around
larger areas. Larger areas may include an entire captioning line or lines, a word bank,
or a text paragraph. Larger interest areas are well suited to examine readers’ more
global processing behaviors in relation to a given task and task instructions (e.g.,
FIGURE 6.5 Different types of interest area in a banked gap-fill test.

(Source: Modified and reprinted from McCray, G., & Brunfaut, T., 2018. Investigating the construct
measured by banked gap-fill items: Evidence from eye-tracking. Language Testing, 35(1), 51–73, with
permission of SAGE Publications, Ltd., copyright © 2016 The Authors).
watching a video or filling out a gapped text). A nice example of a study that com-
bined larger interest areas and word-based interest areas is McCray and Brunfaut’s
(2018) assessment research. As mentioned in Section 3.2.5, the authors examined
the processing profiles of higher- and lower-scoring test takers on gap-fill items on
a standardized reading test. The authors hypothesized that lower-performing test
takers would display local reading strategies more often than higher-performing
test takers. One of their measures of local reading was the time spent fixating on
words surrounding the gaps in the gap-fill task. To measure this, the authors drew
areas of interest around the three words—a number the authors picked them-
selves—prior to and following the gaps (see Figure 6.5). Regarding global pro-
cessing differences, the authors hypothesized that higher performers would spend
comparatively less time on task processing (the word bank), leaving more time for
the higher-level processing of the text (the whole text area). To test these hypoth-
eses, McCray and Brunfaut defined two large interest areas around the word bank
and the text as a whole, respectively. In all, the authors had three very distinct types
of interest area—text gaps, three-word phrases, and word bank vs. text—and each
interest area was argued to capture a different aspect of test takers’ behavior.
6.1.3 Image-Based Interest Areas

6.1.3.1 Images in Text-Based Research
A third possibility for defining interest areas is to focus on the processing
of images. Images are of course the primary areas of interest in visual world
research (see Chapter 4 and Section 6.1.3.2), yet images also play a role in mul-
timodal studies with written text. For instance, Révész, Sachs, and Hama (2014),
whose study was reviewed in Section 3.2.3, sought to validate the cognitive
complexity of different tasks in ISLA. To measure fixation behavior, the authors
combined the two picture prompts in each task version into a single interest
area (see Figure 3.4, reproduced as Figure 6.6 below). In Lee and Winke’s (2018)
assessment research (see Section 3.2.5), the interest areas were different compo-
nents of the TOEFL Primary Speaking Test, shown in the right panel of Figure
6.7. And lastly, in Suvorov (2015), also a language assessment study, the author
selected the videos embedded in the online test interface as the basis for further
analysis (see Figure 6.6).
As in text-based research that focuses on more global text processing (see
Section 6.1.2), studies with pictorial interest areas probe into learners’ general
processing of language tasks, including language assessment tasks. These stud-
ies were not designed to address questions related to grammatical sensitivity or
the acquisition of certain linguistic features; instead they deal with issues of task
design. In Suvorov’s (2015) study, for instance, the question was whether the type
of video used in a listening assessment (content video or context video) would
influence ESL test takers’ viewing behavior and test performance. To this end, the
author extracted eye-movement measures for the videos embedded in the screen
(see interest area in Figure 6.7, left panel) and correlated these eye-movement
measures with participants’ test scores for the corresponding subtests.
FIGURE 6.6 Pictorial interest areas.

(Source: Révész et al., 2014).
FIGURE 6.7 Interest areas in assessment research.

(Source: Left: Printed with permission from Dr. Charles Bailyn,Yale University; Suvorov, 2015. Right:
Lee & Winke, 2018. Copyright © 2013 Educational Testing Service. Used with permission).
6.1.3.2 Images in the Visual World Paradigm

Using images as areas of interest is the default (if not the only option) in a visual
word experiment given the joint presentation of visuals and audio. The concept
of interest areas in the visual world is similar to that in text-based research we
have discussed so far.What distinguishes the visual world paradigm is that any trial
will typically consist of multiple images or objects and each image or object will
typically require its own interest area. How researchers do that is subject to some
variation.
In a typical experiment, participants are expected to direct their attention
to an object on the screen when the object label is mentioned in the auditory
input. For example, upon hearing “The man is washing the dog”, participants are
expected to look at the image of the dog as the final word in the sentence unfolds
(see Figure 6.8, adopted from Andringa & Curcic, 2015). Dog, then, is the target,
on which data analysis will center primarily, and car is a distractor. Both the tar-
get and the distractor will need to have their own interest areas.
Figure 6.9 illustrates the three primary approaches to drawing interest areas
around images, using this trial as an example. The easiest solution is to split the
entire screen into halves (two images) or quadrants (four images). The downside
to this approach is that you cannot tell whether the participant was looking at the
image or the white space between images. For instance, in the top panel of Figure
6.9, all fixations will be assigned to either the car or the dog, even though the
intended location of the second fixation is doubtful. A second possibility, shown
in the middle panel, is to draw boxes of equal sizes around the images, even when
the images themselves may vary in size (pixel area). This is how Andringa and
Curcic (2015) drew their interest areas. The white space included in the interest
area serves as a buffer for recording errors or error in the human visual system.
FIGURE 6.8
Sample display with a target image (right) and a nontarget, distractor
image (left).
(Source: Andringa and Curcic, 2015).
However, when fixations are far out, they will be omitted from data analysis. This
would be the case for the second fixation in Figure 6.9, which falls outside of
either box in the center panel. Finally, the most conservative option is to draw
free-form interest areas that follow the external boundaries of the object. In our
example, only the third fixation on the dog will be counted for analysis in the
lower panel of Figure 6.9. Because objects vary in pixel sizes, researchers need to
ensure their comparisons are valid when they use free-form interest areas. Larger
objects will naturally attract the eye gaze more. To account for these size differ-
ences, researchers can compare looks to the same objects in the same scene while
manipulating the audio (see Section 6.3.2, for more information). In sum, one and
the same set of eye-movement data may be processed differently, depending on
how researchers conceive of the interest areas in their study.
The distinction between target and nontarget images is foundational to vis-
ual world research. What the nontarget images are, however, and what their role
is in a study will vary. A study that included multiple types of nontarget images
was Trenkic, Mirković, and Altmann (2014). Recall from Section 4.2.2.3 that the
authors were interested in measuring participants’ knowledge of the English defi-
nite and indefinite articles. Participants were asked to put objects in containers
(e.g., Put the cube in the can).The authors varied the article preceding the container
(i.e., inside the can or inside a can), as well as the number of potential goal referents
in the display (i.e., one open can or two open cans). On top of a target, which was
the can in which participants put the cube, each scene included a competitor, a
distractor, and two fillers (see Figure 6.10). Here, the competitor was the other
can, which was closed or in which participants did not put the cube. More gener-
ally, a competitor is any object onscreen that shares some form-based, phonologi-
cal, semantic, or grammatical properties with the target object. The competitor
is hypothesized to compete with the target for visual attention because of their
shared properties. A different kind of container in Trenkic et al., namely a basket,
was the distractor. The role of the distractor is to give participants a genuine
FIGURE 6.9
Three types of interest areas around discrete images in a visual world
experiment.
(Source: Picture stimuli from Andringa & Curcic, 2015).
purpose for listening. In Trenkic et al.’s study, there was more than one kind of
container the cube could potentially go into and so participants had to listen care-
fully to know what to do. Finally, the filler objects (pencil, rope) were unrelated
to the event description. They were there to make the display look more varied
and realistic. Textbox 6.1 summarizes the four possible types of images in a visual
experiment and their functions.
FIGURE 6.10 Four image roles in the same visual display.

TEXTBOX 6.1. KEY ROLES OF IMAGES IN THE VISUAL

WORLD PARADIGM
Target: the object that the participant is expected to look at or manipulate
Competitor: the object that is expected to compete with the target for
attention due to its similarity to the target (e.g., overlap in phonology,
shape, semantics, or grammatical features)
Distractor: generic term for a non-target object; another term for competi-
tor when there is more than one kind of competitor in the display
Filler: objects used to fill up the screen to cover up the object of investiga-
tion and make the task look more realistic for participants
Trenkic et al. (2014) embedded objects in a visual scene, rather than presenting
them as distinct images. Using a scene will let researchers create a more coher-
ent context against which to interpret the auditory input. This may be important
for studies on discourse-level phenomena such as pronoun resolution (Cunnings,
Fotiadou, & Tsimpli, 2017; Sekerina & Sauermann, 2015) or when multiple agents
participate in the same event (Hopp, 2015). Other times, for instance in word- or
sentence-processing research, it may not matter much whether you use separate
objects or a scene with objects embedded in it (Altmann, personal communica-
tion, April 3, 2018). However, opting for a scene or discrete images can carry
implications for how you define your interest areas.When objects are presented in
a larger visual context, you may be more inclined to use free-form interest areas
that follow the object contours closely (see Figure 6.11, top panel). This is how
Trenkic and her colleagues (2014) designed their interest areas (Trenkic, personal
communication, May 31, 2018). Even with hand-drawn interest areas, you could
consider adding a buffer (extra white space around the object) to capture eye fixa-
tions that land slightly off target (see Figure 6.11, mid panel). Finally, researchers
can still draw boxes around the various objects, as they often do when working
with discrete images (see Figure 6.11, bottom panel). Interestingly, researchers
often do not report detailed information on the shape of their interest areas in
FIGURE 6.11 Three types of interest areas around objects in a visual scene.

(Source: Picture stimulus from Trenkic et al., 2014).
their articles, so readers are left to infer this part. In the spirit of research transpar-
ency, authors may wish to add this information in their research papers. Including
a visual, such as the figures presented in this section, that shows the interest areas
can be very helpful.
The same principles that apply to interest areas around objects in scenes also
apply to movies—researchers can track participants’ attention to different objects
in the movie by drawing interest areas around them. For instance, Flecken et al.
(2015) used rectangular interest areas to capture eye gaze data as participants were
watching motion events (see Figure 6.12). Two things made Flecken et al.’s study
special. First, it was a production experiment (see Section 4.2.4) and second, it
required the use of dynamic interest areas given that motion is an inherently
dynamic event. Specifically, one entity (e.g., the pedestrian) followed a trajectory
toward a potential endpoint (e.g., the car). To capture participant attention to the
moving entity, the interest area had to be moved along the same path as the entity
did. The authors defined the interest area on a frame-by-frame basis. Because
most movies are filmed at a rate of 24 frames per second, this meant they had
to update the interest area at least 24 times a second! That amounted to drawing
close to 3000 interest areas to code the 20 six-second video clips in the study.
Working with dynamic interest areas, then, is still a time-consuming enterprise
(also see Section 9.3.1, research idea #10, and Section 9.3.2.3), even though soft-
ware developers are working to automate the process at least partly.
FIGURE 6.12
Dynamic interest areas in a movie depicting a motion event. As the
pedestrian walked toward the car, the corresponding interest area had to
be redrawn on a frame-by-frame basis.
(Source: Reprinted from Flecken, M., Weimar, K., Carroll, M., & Von Stutterheim, C., 2015. Driving
along the road or heading for the village? Differences underlying motion event encoding in French,
German, and French-German L2 users. The Modern Language Journal, 99, 100–122, with permission
from Wiley. © 2015 The Modern Language Journal).
6.1.4 Setting Interest Areas in Your Own Research

This overview has shown that interest areas come in different shapes and sizes to fit
eye-tracking users’ research interests and analysis needs. How researchers define their
interest areas will determine what measures they can compute and for what regions
(see Chapter 7). Interest areas literally define how researchers look at their data, so
it is important to read this part of a research paper (usually the Methods section)
carefully. Researchers, on the other hand, need to make sure to include information
about interest areas in their papers for their readership.When conducting your own
eye-tracking research, previous studies can serve as a starting point for determining
what regions to focus on in your analyses. For that, it is helpful to know what eye-
tracking research is out there; the literature reviewed in Chapters 3 and 4 can help
you break into the different subdisciplines. At the same time, one of the appeals of
eye tracking is there is always room for creativity, as long as it can be supported by
reasonable justifications. For instance, McCray and Brunfaut (2018), whose study
was reviewed in Section 6.1.2, decided to include three words before and after the
gap to compare the amount of local processing between two groups of test takers.
This decision was made based on the authors’ understanding of their stimuli and
the task at hand. Arguably, less than three words would not provide test takers with
enough grammatical and lexical information to fill out the missing word. Larger
interest areas (e.g., clauses) would take up a large portion of the text and render the
areas a doubtful source of local-processing data. In visual word studies, researchers
first need to have a clear idea of what role each image plays in the display. Areas of
interest can be defined by splitting the screen, drawing equal-sized boxes, or retrac-
ing the boundaries around the object, either with or without an extra buffer for
human or technical (eye tracker) error. In sum, interest areas display considerable
variation across subdisciplines and even within subdiscipline, they can differ depend-
ing on what the researcher’s goals are. As a researcher, then, it is good to have a broad
overview of the field. Equipped with this knowledge and a good understanding of
your own materials and research goals, defining interest areas should be easy.
6.2 Guidelines for Text-Based Eye-Tracking Research

Now that we’ve addressed the concept and practical aspects of drawing interest
areas, we shall turn to the nuts and bolts of item design. Here, guidelines differ by
research paradigm.The remainder of this chapter will be organized accordingly. In
this section, I will focus first on factors that are specific to text-based eye tracking.
Section 6.3 addresses topics related to the design of visual world research.
6.2.1 Spatial Constraints
Eye-movement recordings tend to be less accurate when participants are looking
to the outer edges of the screen. The most extreme case is track loss, a temporary
interruption in recording due to the eye tracker’s inability to locate the eye gaze
(for more information on track loss, see Section 8.1.2). There are several things
researchers can do to prevent track loss and other system glitches from happening.
Here we concentrate on factors that contribute to a robust study design. Practical
tips for enhancing recording quality during data collection will be presented in
Section 9.3.2.2.
First, researchers can insert margins around the edge of the screen.These
margins (large blank borders) do not contain any information and therefore, few,
if any, eye fixations are expected to land in these regions. In research strands
where having lots of blank space may not be desirable (e.g., subtitles and captions
research), researchers could simply make sure they move their interest areas (e.g.,
subtitle regions) away from the edge. In my own work, I leave at least a 1.5 to 2
cm buffer. Likewise, text-based eye-tracking researchers should avoid placing
interest areas at the end or beginning of a line, regardless of whether their
studies include one or multiple lines of text. Researchers in ISLA, vocabulary
(studies with glossing), and assessment will also need to place their interest areas
further away from the left and right edge.This is because viewers have a tendency
to overlook or skip information in these areas. Rayner (2009) noted that the first
and last fixations of a text line are typically five to seven letter spaces from the
beginning and the end of the line, respectively. The majority of all fixations fall
within these extreme ends. When moving between lines of text (a long-distance
eye movement that is known as a return sweep), readers accumulate extra error.
During these transitions, additional, corrective saccades may be necessary for the
eyes to reach their intended location. All these factors point in the same direction:
it is better to keep interest areas out of the peripheral regions of the screen. Third,
it is important to note that interest areas should be large enough to reduce
the probability of skipping and observing a 0 fixation. (A dataset with many skips
can be more difficult to analyze, see Chapter 7.) Because the probability of fixa-
tion (i.e., non-skipping) increases with word length, longer words tend to offer
certain advantages for data analysis compared to shorter words. This is not to say
short words cannot be analyzed, but researchers may need to get more inventive,
for instance by focusing on decreases in skipping rates as well as increases in fixation
duration (for an example, see Drieghe, 2008). When interest areas are the size of
a seven-letter word, skipping rates will be as low as 10%, compared to 80% for
one-letter words (Vitu, O’Regan, Inhoff, & Topolski, 1995). In a similar vein,
ISLA or assessment researchers who work with image-based interest areas should
ensure their interest areas are large enough for the region to be fixated. In practice,
this tends to be less of a concern because image-based interest areas are gener-
ally quite large. Finally, double spacing the text is recommended to account
for the technical limitations of an eye tracker (i.e., bounds on its accuracy and
precision, see Section 9.1.3). Double spacing acts as a protective layer against ver-
tical drift—the systematic recording of eye fixations above or below their actual
location. Drift looks as if the eye fixation bubbles are floating above or below the
text (for an example, see Figure 8.5). However, to spot drift in a recording and
potentially correct it (see Section 8.1.3), text needs to be double- or triple-spaced.
Otherwise, readers will be mistaken for reading the line above or below the line
they were actually looking at.
Let’s consider an example of how my colleagues and I applied the four prin-
ciples in boldface in an actual study. Figure 6.13 shows a screen display from
Godfroid et al. (2018) before (top-right) and after (bottom-right) the researchers’
intervention. This text was extracted from an authentic English-language novel
set in Afghanistan. One computer screen could fit about one quarter of a text page
(top-left). The researchers were interested in studying L1 and L2 readers’ inciden-
tal vocabulary acquisition of the Farsi-Dari words that occurred naturally in the
FIGURE 6.13
A screen display before (top-right) and after (bottom-right) the
researchers’ intervention.
text (e.g., jo and dishlemeh in the example). The first thing to notice is that both
texts are double spaced and have large, 2.5-cm margins on all four sides. The texts
contain two target words, jo, meaning “dear” or “auntie”, and dishlemeh, which is
“a sweet candy made mostly of sugar”. You will notice that we removed the italics
for dishlemeh because we did not want target words to be enhanced visually in the
text. The boxes around the target words represent interest areas, which are used
for analysis (see Section 6.1), but are not visible to participants during the experi-
ment. This brings us to some of the finer changes my colleagues and I introduced
in this text. In the original text (the top-right panel), the target word, dishlemeh,
occurred as the first word in a text line. As discussed previously, this is less than
ideal and so we inserted a hard return after something in the preceding line to
move dishlemeh closer to the center of the line. Second, we merged the interest
areas for Bibi and jo into a single, larger area of interest. To reduce the probability
of skipping the target, it made sense to consider Bibi jo as a single unit for analysis,
both from an eye-tracking perspective and a semantic point of view (recall that
Bibi jo means “dear Bibi” or “auntie Bibi”). However, merging the two interest
areas introduced a new problem because now Bibi jo was also the first word of the
sentence and the text line. A minor modification of the original text (Bibi jo too
always brought → Also Bibi jo always brought) took care of this issue.
6.2.2 Artistic Factors
Print is the means for visual reading. Scholars who specialize in typographical
and vision research care about what features of print contribute to the ease of
reading (Legge & Bigelow, 2011). Eye-tracking researchers share this concern for
text legibility. If nothing else, eye-tracking researchers want to present text in
a way that does not impede readers from exhibiting fluent reading. Going one
step beyond this, many eye-tracking researchers see the value of an ecologically
valid study design. Ecological validity dictates that the text features in an eye-
tracking study should resemble those in natural reading as closely as possible,
within the constraints of contemporary eye-tracking technology (see Godfroid &
Spino, 2015).The question, then, becomes what font size to use in an eye-tracking
experiment so measurement is accurate and the reading is natural (for a critical
discussion, see Spinner, Gass, & Behney, 2013). This issue is particularly important
for researchers interested in word-level phenomena (e.g., grammar and vocabu-
lary researchers) because font type and size will determine what portion of the
visual field a word or words occupy (see Table 2.1).
Findings in vision research are hopeful in that the range of font sizes for flu-
ent reading is quite wide: from 4 to 40 points from a viewing distance of 40 cm
(Legge & Bigelow, 2011). At a 40 cm viewing distance, which is somewhat less
than in most eye-tracking experiments, 4- to 40-point size letters subtend a visual
angle of 0.2° to 2° (for more information on visual angle, see Section 2.1). In
daily life, these more extreme font sizes are used for patient information leaflets
and newspaper headlines. Second-language eye-tracking research tends to employ

font sizes that are closer to the middle of the spectrum. As seen in Tables S5.1–
S5.5 in the online supplementary materials, the majority of L2 studies utilize an
18- to 24-point font. Eighteen-point font in particular seems to be a popular
choice.
In Table 2.1, I listed conversions of font sizes, from 16 to 24 points, into degrees
of visual angle for typical viewing distances in an eye-tracking experiment. Larger
fonts are generally reserved for single-word reading tasks, for instance a 44-point
font in a lexical decision task by Miwa et al. (2014) and a 72-point font for the
oral reading of single words in De León Rodríguez et al. (2016). Spinner et al.
(2013) showed that readers are less likely to skip function words in sentences
printed in Calibri 44 pt. than in Calibri 24 pt. Toward the lower end of the spec-
trum, researchers may work with smaller font sizes when they place authentic
materials directly on a computer screen, for instance IELTS (International English
Language Testing System) reading passages in an 11-point font in Bax (2013). It
is easy to envision how future ISLA researchers would follow a similar approach.
Smaller font sizes have the advantage of enabling more text to be fitted onto a
single line, but as a researcher, you will need to test whether your eye tracker can
still record word-level eye gaze data accurately if that is your goal. Ultimately, you
should pick a font size that the reader can read comfortably, for which the eye
tracker can record word-level data accurately and, if possible, that is representative
of font sizes used in similar tasks outside the lab.
Although font size is clearly an important factor, more than half (54%) of the
authors of text-based studies did not report font size information in their papers
(see online supplementary materials, Tables S5.1–S5.5). For the field of L2 eye-
tracking research to move forward, reporting font size and type, together with
other aspects of screen layout information, will be an important step to facilitate
replication, promote methodological standardization, and enhance the quality of
future research (Spinner et al., 2013).
Similarly to font size, font type is another feature to consider when designing
an eye-tracking study. Font types differ according to whether they are mono-
spaced (i.e., fixed-width fonts) or proportional. In a monospaced font, every
letter or character occupies the same amount of horizontal space. An example is
the bottom row of Figure 6.14, in which every character has the same width. The
advantage of a monospaced font is that it enables researchers to convey key infor-
mation about their experiment in degrees of visual angle.This is shown in the fol-
lowing example from Lim and Christianson (2015): “Stimuli were presented in a
14-pt monotype font. Participants viewed the stimuli binocularly on a monitor 66
cm from their eyes; at this distance, approximately 2.5 characters equaled 1 degree
of visual angle” (p. 1290). When other eye-tracking researchers read this informa-
tion, they can translate the numbers back to their own experimental set-ups and
replicate the exact study design. This is not possible when researchers use a pro-
portional, rather than a monotype font or fail to report font type information. As
FIGURE 6.14 An example of monospaced and proportional font.

(Source: Created by Garethlwalt, licensed under the Creative Commons Attribution 3.0 Unported).
seen in the top row of Figure 6.14, the different characters in a proportional font
have a variable width. For example, uppercase “P” is much wider than lowercase
“i”. Because of this, word length cannot be mapped onto degrees of visual angle
and researchers cannot draw comparisons between studies.
Another reason monospaced fonts are preferred in eye-tracking research relates
to the accuracy of the eye tracker. Accuracy refers to how closely the eye gaze
position recorded by the eye tracker matches the true position of the eye (for
more information, see Section 9.1.3). As a rule of thumb, you should avoid defin-
ing interest areas that fall within an eye tracker’s error margin. For example, the
average accuracy rate of the EyeLink 1000, as reported in the manufacturer’s
manual, is 0.25–0.50° of visual angle. To give you an idea of how much that is,
hold your hand at arm’s length. One degree of visual angle corresponds approxi-
mately to the width of your pinky finger held out at this distance (for further
details, see Section 2.1). In a typical recording with an EyeLink 1000, the differ-
ence between the actual gaze and the recorded gaze tends to be half the size of
your pinky finger or less. Now let’s apply this to a text-based processing study,
using Lim and Christianson’s (2015) study as an example. Lim and Christianson
reported that a single character in their study spanned approximately 0.4° of visual
angle.Therefore, the researchers were fully equipped to examine word-level inter-
est areas, as they did, but they would have lacked the spatial resolution to run more
fine-grained analyses at the letter level. To perform a letter-based analysis, they
would have had to enlarge the font size until the corresponding degrees of visual
angle exceeded the eye tracker’s error margin (> 0.5°).
More generally, the use of a monospaced font, as opposed to a proportional
font, helps researchers select a font size that is appropriate for their research ques-
tions and the eye-tracking equipment they have. Monospaced fonts afford a better
control over the visual input. Figure 6.15 lists different types of monospaced fonts
with the actual fonts shown in the second column.
Finally, because most eye-tracking experiments in our field take place on a
computer screen, it is good to pause and think about what background color
to use. In our lab, we prefer using a light gray, rather than a white, background
for the experiment, because this is less tiring for participants’ eyes. Background
colors can be chosen from a color wheel or entered as red, green, blue (rgb)
FIGURE 6.15 Examples of 12-point, monospaced font types.
values in the programming software. For example, the rgb values of the light gray
background my colleagues and I use are 204, 204, 204. Once you have selected a
background color, you want to use the same color consistently across all screens.
Like many other types of research, eye-tracking experiments consist of differ-
ent stages, including camera set-up and calibration, instructions, practice trials,
the main experiment, and any potential secondary tasks. The background colors
should remain the same across all these different stages, because changes in hue
could cause changes in pupil size.This could be detrimental for data quality, given
that contemporary video-based eye tracking relies on the pupil for accurate meas-
urement. From a practical standpoint, maintaining the same background through-
out the experiment means you may need to change the default background color,
which is usually white, in multiple places in the programming software.
6.2.3 Linguistic Constraints
The final set of guidelines for text-based experiments pertains to researchers who
are comparing word- or phrase-level interest areas that differ in their content or
lexical makeup (see Sections 6.1.1 and 6.1.2). This includes most studies within
the grammar and vocabulary strands (see Sections 3.2.1 and 3.2.2), but not yet at
this time ISLA, assessment, and subtitles or captions research (see Sections 3.2.3,
3.2.4, and 3.2.5). If you are not sure whether you should control the linguistic
properties of the written materials in your study, you can ask yourself the follow-
ing question: Am I comparing words or phrases that differ in their lexical com-
position? If the answer to this question is affirmative, then yes, you will want to
control the linguistic properties of your materials.
In Section 2.5, I introduced a host of linguistic variables that influence when
the eyes move. These variables should be considered when you are designing a
study. Imagine a hypothetical study on the role of cognates in L2 reading, which
could include an item like this: It was very kind/considerate of you to send me flowers.
(Considerate and considerado are English-Spanish cognates, kind and amable are not.
Therefore, they represent a cognate/noncognate pair.) The words kind and consid-
erate have several commonalities, including their part of speech and meaning, yet
they differ in much more than their cognate status alone. Kind is a shorter and
more frequent word than considerate, L1 English speakers typically start using kind
at a younger age, and people generally indicate they are more familiar with the
word kind than the word considerate. For all these reasons, it is not a good idea to
compare the eye-tracking data for these two target adjectives directly if the goal is
to study the role of cognate status in L2 reading. Many other linguistic variables
could be accounting for the differences in the eye-movement data.
In a carefully designed study, then, target words or phrases differ only with
regard to the variable the researcher wants to study (e.g., cognate status). All other
linguistic properties of the target words have been carefully controlled for. As we
saw in Section 2.5, the “big three” variables that influence eye fixation durations
are frequency, contextual constraint or predictability, and word length
(Kliegl, Nuthmann, & Engbert, 2006, p. 13). Therefore, these are the first variables
that should come to mind when designing a text-based study: are my word pairs
or phrases approximately equally frequent? Do they appear in the same context?
Do they have the same length? Additional variables to control for are age of acqui-
sition, part of speech, concreteness, and the location in a clause (see Section 2.5).
Questions here are: at what age do L1 speakers first start using the target words or
phrases? Do my word pairs or phrases have the same part of speech (e.g., all nouns
or all verb–noun collocations)? Are all items concrete or all items abstract? Do
my word pairs or phrases occur in a similar place in the sentence (e.g., not at the
end of a clause or sentence)? Table 2.3 in Chapter 2 provides a good overview of
these different variables. Importantly, it details what sources (e.g., corpus, normed
database) you can consult or what data you can collect to determine whether your
experimental conditions have been matched appropriately.
6.2.3.1 Experimental and statistical control

When controlling your materials for a given linguistic variable (e.g., word
property), the first step is to obtain information on that variable for all the items
in your data set. To control for frequency, you need to know your words’ fre-
quency of occurrence, to control for word length, you need to know the length
of your target words, and so on (see Table 2.3, for details). Figure 6.16 illustrates
the outcome of this process for the item list in Godfroid and Uggen (2013).
Godfroid and Uggen wanted to control the length and frequency of occurrence
of the target verbs in their study. They investigated L2 processing of 24 German
verbs, 12 irregular verbs, and 12 regular verbs. The researchers recorded the
length and frequency of occurrence of each verb. For frequency of occurrence,
the researchers consulted the free, online corpus, digitales Wörterbuch der deutschen
Sprache, “Digital dictionary of the German language” (www.dwds.de). Frequency
per million was the raw number of hits divided by 15.462297, the total number
of million words in the corpus. Once the researchers had recorded the frequency
and length information of each verb, they could compare the data for the different
verb types statistically.To do so, they ran t tests on the regular versus irregular verbs
FIGURE 6.16 Modified verb list from Godfroid and Uggen (2013), with word length and
frequency information recorded for all the verbs in the study. Calculation of
frequency per million is illustrated in the formula bar.
and one-way ANOVAs on the regular versus irregular e → i versus irregular a → ä
verbs. When conditions do not differ significantly, as was the case for the example
in Figure 6.16, researchers conclude that their stimuli have been matched on the
dependent variable.2
The procedure described here, which involves matching items between
conditions manually, is termed experimental control (also see Section 2.5).
Experimental control is the most common way of controlling written materi-
als, especially when the materials consist of sentences, rather than longer texts.
Experimental control works best when researchers have identified specific interest
areas within sentences and focus their analysis on these interest areas.When all the
words in a sentence are of interest or researchers work with long texts, another
type of control may be necessary.
Statistical control is the practice of entering linguistic variables as control var-

iables in a statistical analysis (see Section 2.5). The variables that are entered in the
analysis usually cannot be controlled for experimentally, hence researchers resort
to statistical control. For instance, in Cop, Drieghe, and Duyck (2015), bilinguals
read half an Agatha Christie novel in English and the other half in Dutch transla-
tion. This ingenious design enabled the researchers to control for the content and
general linguistic properties of the two texts, but at the level of individual sentences
some differences remained. Similarly, Sonbul (2015), in a study on collocation pro-
cessing, measured L1 and L2 English speakers’ sensitivity to the collocation strength
of different adjective-noun sequences. Because the adjective-noun combinations
were intended to be synonyms (e.g., fatal mistake, awful mistake, extreme mistake), the
researcher could not control the length and frequency of the adjectives in each tri-
plet. In such designs, then, researchers choose to account for the differences between
their conditions statistically. In so doing, they gain a more accurate estimate of what
they wanted to study, for instance effects of bilingualism in Cop, Drieghe, & Duyck
(2015) and sensitivity to collocation frequency in Sonbul (2015).
Table 6.1 is a partial reproduction of one of Sonbul’s statistical analyses, which
exemplifies the use of statistical control.The primary variable of interest in Sonbul’s
study was collocational frequency. Consistent with Sonbul’s hypotheses, colloca-
tion frequency was a significant predictor of first-pass reading time of both L1 and
L2 English readers. Here we focus on the many other variables included in the
table. Age and Vocabulary Size are individual differences variables; Trial Number
was a control variable related to the experiment. Of most interest here are Pair
Length and Word 2 Frequency, which are control variables related to the adjec-
tive-noun combinations. Recall that Sonbul was unable to control for collocation
length (e.g., fatal mistake is shorter than extreme mistake) or for the frequency of
the component items (e.g., fatal occurs less frequently than extreme in the British
National Corpus). To account for the potential influence of these variables on
TABLE 6.1 Best-fitting linear mixed effects model for log first pass reading time
Predictor Estimate SE t p(>|t|)

(Intercept) 6.22 0.03 248.04 <.001
Trial Number −0.00 0.00 −1.28 0.20
Vocabulary Size (resid) −0.04 0.00 −7.76 <.001
Pair length (resid) 0.03 0.00 10.17 <.001
Word 2 frequency (log) −0.11 0.05 −2.29 0.02
Collocational frequency (log) −0.02 0.01 −2.47 0.01
Age 0.02 0.00 3.91 <.001
Vocabulary Size (resid) × Trial −0.00 0.00 −2.22 0.03
Number
Note: resid = residualized
(Source: Sonbul, 2015).
first-pass reading times, Sonbul calculated the collocation length and frequency
information as described in the previous example. She then entered these vari-
ables into a multiple regression analysis. Both variables were statistically significant,
suggesting it is a good thing Sonbul controlled for them statistically. Adding these
control variables to the model further strengthened the researcher’s claim that
L1 and L2 English readers were sensitive to collocation frequency and not some
other variable that correlated with collocation frequency (e.g., frequency of the
constituent words). Statistical control, then, can loosen the shackles of experimen-
tal control in research designs and give researchers more flexibility. Textbox 6.2
summarizes the main points about how to design an eye-tracking study with text.
TEXTBOX 6.2. HOW TO DESIGN A TEXT-BASED STUDY
1. Make sure you insert margins around the outer edges of the screen;
interest areas should be wide enough for fixations to be recorded (in
other words, wide enough to reduce the possibility of skipping); less
accurate eye trackers require larger interest areas; avoid placing the
interest areas at the beginning or end of a sentence or at the beginning
or end of a line of text; lastly, double space the text.
2. A monospaced font type is preferred to a proportional font type; keep
the font sizes within the range for fluent reading (Legge & Bigelow,
2011) and report what font type and size you used; the background
screen color should be consistent throughout the entire experiment.
3. Linguistic variables such as word length, frequency, and predictability in
context should be controlled for experimentally or statistically.
6.3 Visual World Research3

While text-based and visual world researchers often share theoretical interests,
the creation of materials for these two paradigms differs greatly. Unlike their
text-based counterparts, visual world studies tend to have little to no text. Instead,
these studies combine visuals and audio, and both types of stimuli require care-
ful consideration when designing a study. A foundational idea in visual world
research is that eye movements reflect the interplay of visual and auditory (linguis-
tic) processing, and eye-movement data need to be interpreted as such (Huettig,
Rommers, & Meyers, 2011). Researchers can manipulate either the visuals (i.e.,
images), the language, or a combination of both to be able to analyze the joint
effects of both input types on participants’ looking behavior. To reflect the para-
digm’s multimodal nature, I will structure this section in two parts. First, I will
present issues related to visuals (see Section 6.3.1) and then I will consider factors
in auditory materials design (see Section 6.3.2).
6.3.1 Selecting Images
6.3.1.1 Experimental Design
The general principle in visuals creation is that overall effects due to visual pro-
cessing should be minimized and controlled in the study. In an ideal visual world
experiment, there are a number of objects on the screen (typically between two
and four, although it could be more) and the observer’s eye gaze wanders freely
and equally across the different objects. Only when the auditory input is pre-
sented does the participant orient more strongly toward one or more objects.
This shift in eye gaze, then, is attributed to the unfolding speech signal rather than
some inherent properties of the images.
For eye gaze behavior to be tied to the auditory input, images on the screen
should be comparable in terms of their visual salience. If certain visual proper-
ties of the display render an image more eye catching, this can influence looking
behavior and confound the results. As a researcher you should do your best to
select images that do not contain any visual, phonological, semantic, or linguistic
confounds (see Sections 6.3.1.2 and 6.3.1.3). An additional and necessary measure
is to design your study so any remaining extraneous influences are divided equally
across conditions and thus accounted for by the experimental design.Therefore, as
a researcher you want to exercise control at two levels—control over your materi-
als and experimental control.
Let’s take a look at Dijkgraaf, Hartsuiker, and Duyck’s (2017) study to under-
stand how experimental control works. Dijkgraaf and her colleagues extended a
seminal study by Altmann and Kamide (1999) with monolingual English speak-
ers to the field of bilingualism (for a review, see Section 4.2.2.2).The researchers
were interested in whether or not Dutch-English bilinguals were able to predict
aspects of upcoming speech in their L1 and their L2 based on the seman-
tic information in the verb. For example, upon hearing “Mary drives …”, can
the listener predict the semantic category of vehicles (e.g., a car, a van) in the
object position based on the constraining nature of the verb drive? Dijkgraaf
and colleagues used a display with four images, as shown in Figure 6.17. They
compared the probability of listeners fixating on the target car in a semantically
restrictive condition (“Mary drives …”) and a semantically neutral condition
(“Mary takes …”), respectively.
What is important for the present discussion is that the displays for the restric-
tive and neutral conditions stayed the same; only the auditory input was varied
(also see Kohlstedt and Mani, 2018, in Section 4.2.2.5). This is one approach to
experimental control. It is arguably the simplest one. An advantage of keeping
the visual context constant is that researchers can safely attribute any differences
in looks to differences in the auditory input. After all, the images do not change.
Therefore, any remaining imperfections in the images will cancel each other out,
leaving the audio as the only possible source of statistical effects. It’s that simple.
FIGURE 6.17 Display from Dijkgraaf et al.’s (2017) study. Note: Display recreated with
images from the picture-naming database by Severens et al., 2005.
Now, let us take this example one step further. As previously described, some
research questions do not allow researchers to keep the visual input constant (see
Section 5.1). We discussed Marian and Spivey (2003a, 2003b) and Trenkic et al.
(2014) as two such example studies. By the same token, it is easy to think of an
extension of Dijkgraaf et al. (2017) in which the displays are no longer identical
between conditions. One possibility would be to introduce a semantic competitor
into the display, for example a bike in a display with a car (see Section 6.1.3.2). To
test for potential effects of the semantic competitor, researchers would necessarily
need to compare displays with a semantic competitor and displays without (see
Figure 6.18). This means the two displays will no longer be identical.
When displays are different, researchers can no longer ignore what happens to
the other images on the screen. Perhaps participants find bikes inherently more
interesting and appealing than potatoes (I certainly do) and this changes their over-
all viewing behavior. To account for such differences, researchers need to express
looks to the target (i.e., the car) as a percentage of the overall looks to the screen.
This will give them an estimate of the baseline distribution of looks across the
screen in the control condition. Once they account for these baseline effects
(see Textbox 6.3), they may test how participants’ viewing preferences change (i)
in a semantically constraining context, and (ii) when there is a semantic competi-
tor on the screen.
FIGURE 6.18 Original display (left) and hypothetical extension (right).The display on

the right includes a semantic competitor, bike, for the target object, car.
Note: Display recreated with images from the picture-naming database
by Severens et al. (2005).
(Source: Dijkgraaf et al., 2017).
TEXTBOX 6.3. BASELINE EFFECTS

Baseline effects are preexisting differences in how soon, how often, or how
long participants look at a given image, independent of the spoken linguis-
tic input. Baseline effects represent a source of noise in the data and are
therefore best avoided through careful materials design. If baseline effects
do exist, researchers can make comparisons between conditions that present
the same baseline effects (i.e., compare trials with the same display but dif-
ferent audio) or modify their data analysis to take these effects into account.
A third example of experimental control involves alternating the roles of images

between trials.This is fairly easy to do.You can rotate the roles of images regardless
of whether the composition of the display stays the same or varies between trials.
Rotating simply means that the target image in one trial becomes a distractor in a
different trial, and vice versa. For instance, in Dijkgraaf et al.’s study, the target car
served as a distractor in the trial Mary reads/steals a letter. Conversely, potato, which
was a distractor in our previous example, became a target in the trial Mary boils/
buys a potato (see Figure 6.19). Rotating image roles this way will even out any
preexisting differences (imperfections) in the visual or linguistic properties of the
images. Therefore, rotation helps you ensure you are measuring the effects of the
audio, and not some unintended image properties.
FIGURE 6.19 Changing roles of targets and distractors in Dijkgraaf et al. (2017).
Related to this point, it is best to rotate the position of images within a screen
as well. This is to balance out any spatial biases participants may bring to the task.
Participants who read from top to bottom, left to right tend to show a bias for
the top-left quadrant of a screen, even in non-reading tasks. To account for this,
images (targets, competitors, distractors, fillers) should occur with equal frequency
in each position on the screen. This means that in a four-image display, the target
image should occur in each position in 25% of the trials. Textbox 6.4 summarizes
the main points related to designing a visual world experiment.
TEXTBOX 6.4. HOW TO DESIGN A VISUAL WORLD

EXPERIMENT: VISUALS
1. Be careful not to introduce any visual, phonological, semantic, or

linguistic confounds into your displays. Images presented together
onscreen should not overlap in their visual (form-based), phonological,
or semantic properties, unless this is the focus of your study. For studies
with bilinguals, properties of the words in both languages should be
considered.
2. Whenever possible, keep the images constant between conditions and
only change the audio.
3. If images need to change between conditions, make sure you account
for baseline effects in your images. To do so, consider looks to the target
in relation to the other images on the screen. Only then it is safe to test
for changes between conditions.
4. Try to alternate the role of images between trials (e.g., target–distractor /
distractor–target or target–competitor / competitor–target).
5. Make sure you counterbalance the position of images on the screen.
6.3.1.2 Visual Properties of Images

In the previous section, I described the many virtues of experimental control
in visual world research. Parallel to these design efforts, researchers will want
to approach image selection carefully, because a careful selection of images can
preempt many of the data issues alluded to in the previous section, including
baseline effects (see Textbox 6.3). As a reminder, visual world researchers aim to
design trials in which any overt attention paid to the target image can be traced,
not to the visual properties of the image, but to the auditory input. The implica-
tion of this is that any influence of the visual properties of the images presented on
the screen should be minimized. Properties to consider are color, size, brightness,
and contrast, as well as spatial frequency (e.g., dense vs. sparse) and image style
(e.g., line drawing vs. picture). In general terms, properties that make an image
more salient can attract listeners’ eye gaze and introduce baseline effects in the
eye-movement data. These properties are not unlike the animations and input
enhancement techniques people use in PowerPoint presentations to direct their
listeners’ attention to important information in the slides. However, while high-
lighting information in a PowerPoint presentation (e.g., by changing the font size
or color) is a good thing, in a visual world experiment researchers strive to make
all the information on the screen look equally important.
Let us consider an example of how different kinds of displays might exert an
influence on participants’ looking behavior. Figure 6.20 presents three displays
FIGURE 6.20 Three possible displays of a balloon, a shark, a shovel, and a hat. Note: Shark
in Display One reproduced under a Creative Commons Attribution-
ShareAlike 2.5 license, https://creativecommons.org/licenses/by-sa/2.5/
deed.en Note: Displays Two and Three recreated with images from the
International Picture Naming Project (Bates et al., 2003; Szekely et al., 2003).
(Source: Modeled after Marian and Spivey [2003a, 2003b]).
that I modeled after Marian and Spivey’s (2003a, 2003b) influential experiments
(for a review, see Section 4.2.1). The participants in Marian and Spivey’s studies
manipulated real objects (artifacts or toy replicas) placed on a white board; for
illustration purposes, I have replaced them with images here. All displays are meant
to depict a balloon, a shark, a shovel, and a hat. The first display was originally
created for a class.The instructor looked for images on the internet, using only his
intuition as a guide. The result is a mix of images with and without background
color, which look good enough for teaching but would probably not pass muster
with article reviewers because they are so diverse.
For one, the image background color does not add anything to the images. It is
generally better to select line drawings without a color background or, at a mini-
mum, be consistent across the different images in the display. Another observation
is that the images in the first display differ in how clear they are (cf. the hat and the
shark, which could be mistaken for another big fish). Because word recognition is
a key component of visual world research, the selection of clear and prototypical
images that can be readily recognized matters greatly. Normed databases, which
will be introduced in the next section, can help with this goal.
Building on these ideas, I adopted images from the International Picture
Naming Project (Bates et al., 2003; Szekely et al., 2003) for Displays Two and
Three instead. For the purposes of this book, I shaded the images in the second
display as a proxy for using color. Color can render images more lively, but at
the same time, color can introduce a new confound into the stimuli. Specifically,
imagine that in the second display, the balloon is colored red. The balloon may
then stand out against the other objects on the screen because it is more visu-
ally salient. Fire engines and other emergency vehicles use red for obvious rea-
sons: to attract attention. Using black-and-white images, or normed color images
(Rossion & Pourtois, 2004), will let you avoid these issues (also see Section
6.3.1.3). Consequently, the display on the bottom, which features the original
images from the International Picture Naming Project, will be the most suitable
for research purposes. There is no confound from color; the images are largely
clear and identifiable and have been normed extensively in a previous study (Bates
et al., 2003; Szekely et al., 2003, 2004, 2005). Finding suitable images, then, is an
integral part of designing materials. Always make sure you pilot your materials
with a similar group of participants before the main study. If eye fixations are
divided roughly equally across the images at the onset of each trial (before the
audio begins to play), you know you have done a good job.
6.3.1.3 Naming Consistency and Normed Databases

As discussed in Section 6.3.1.2, images in a visual world study should invite fast
and consistent recognition and yet, seemingly minor details such as image size or
background color could have an unintended influence. For better control over
image properties, researchers could rely on an established database for picture
selection, especially when the images they need are relatively common objects or
actions. Images in these databases have been normed extensively, meaning a large
number of participants have responded to these images (e.g., have rated or named
them) and the image properties are now well understood. Because these databases
provide information on a range of variables (see Table 6.2), it is relatively straight-
forward to match images on any of these. For instance, researchers could filter
the database content for images that all have roughly the same visual complexity.
There is first of all, the International Picture Naming Project (IPNP) (https://
crl.ucsd.edu/experiments/ipnp/). The IPNP consists of multiple databases that
provide naming norms collected from children and college-age adults in seven
different languages: American English, Bulgarian, German, Hungarian, Italian,
Mexican Spanish, and the variant of Mandarin Chinese spoken in Taiwan
(Bates et al., 2003; Szekely et al., 2003, 2004, 2005). It includes pictures from
another well-known database by Snodgrass and Vanderwart (1980). Snodgrass and

Vanderwart initially normed their 260 pictures for American English, but over
the years, this database has been standardized in multiple languages, including
British English, French, Icelandic, and Spanish. In 2004, Rossion and Pourtois
published a version with colored images that is freely available. Severens, Van
Lommel, Ratinckx, and Hartsuiker (2005) created a database with picture norms
for Belgian Dutch using pictures from four previous sources (see Figures 6.18,
6.19, and 6.20, for sample images). Finally, there is a database by Lotto, Dell’Acqua,
and Job (2001). Pictures were normed for Italian and used, for example, in the
study by Morales et al. (2016) with Italian-Spanish bilinguals.
In these databases, researchers typically include a range of measures that will
enable other users to control the properties of their images in a more objective
manner. These measures include name agreement (i.e., the percentage of people
who called an object by its intended name), naming latency (i.e., the time it took
for a participant to name the object), familiarity ratings, visual complexity ratings,
information about the color scaling, and so on.Table 6.2 summarizes the six data-
bases researchers in L2 and bilingual visual world studies have used to date. Most
contemporary databases can be accessed free of charge with due acknowledgment
of the source. These databases, then, were created in a spirit of open science and
for the greater benefit of the research community.
Naming consistency and name familiarity take on special importance
when working with L2 learners.Vocabulary size is always a factor in L2 research
(L2 speakers have smaller vocabularies) and this could have unexpected conse-
quences in the visual world paradigm. Researchers should not assume that L2
speakers know the labels of all the objects on the screen, particularly at lower
proficiency levels. Even when L2 participants do know an object’s name, they may
call it something different due to L1 influence or the variety of the target lan-
guage they learned at school (e.g., British vs. American English). As a researcher,
then, you would want to verify your L2 participants’ familiarity with the intended
object labels. Without strong naming agreement and familiarity with the object
labels, it will usually not be possible to observe any effects in the eye-move-
ment data. To check for vocabulary knowledge, researchers could run the images
by a group of participants with a similar L2 profile. Although this can provide
some indication of whether participants in the main study are likely to know
the words, the safest way is to ask participants in the main experiment directly.
Ask participants to name the objects one by one after the experiment. Any trials
with unknown or inconsistently named objects would need to be removed from
further analysis. Some researchers have also experimented with introducing the
objects and their labels to the participants ahead of time, but this is not uncontro-
versial. Specifically, naming objects will pre-activate their lexical representations
and this could conceivably boost effects that arise later in the eye-tracking data.
Unless pre-teaching is a part of the design and research questions, I recommend
gathering naming data after the main experiment is complete.
TABLE 6.2 Normed databases for picture selection
Reference Key variables normed for Number of Style Access

pictures
Bates et al., 2003 1. Name agreement (e.g., H 520 Black and Free
statistics) white line
2. Naming time drawing
3. Cross-language universality
and disparity of name
agreement
4. Cross-language universality
and disparity of reaction time
5. Picture characteristics (e.g.,
conceptual complexity)
6. Features of the dominant
response and picture
characteristics (e.g., length in
syllables)
7. Cross-language frequency and
length measures
Szekely et al., 2003, 1. Name agreement 421 Black and Free
2004, 2005 2. Naming time white line
3. Features of the dominant drawing
response (e.g., length in
syllables)
4. Picture characteristics (e.g.,
objective visual complexity)
Lotto et al., 2001 1. Degree of categorical 266 Black and Free for
typicality of the concept white line authorized
2. Familiarity drawing users
3. Naming latencies
4. Name agreement
5. Concept agreement
6. Length in letters of the name
7. Length in syllables of the name
8. Frequency of the written name
9. Age of acquisition of the
concept
Rossion and 1. Naming agreement 260 Gray-level Free
Pourtois, 2004 2. Familiarity texture
(an update of 3. Complexity with
Snodgrass & 4. Imagery judgments, naming surface
Vanderwart, 1980) latencies details, and
color
Severens,Van 1. Number of names 590 Black and Free, but must
Lommel, 2. Name agreement white line request
Ratinckx, and 3. Naming latency drawing access first
Hartsuiker, 2005
Snodgrass and 1. Norms for name agreement 260 Black and Requires a
Vanderwart, 1980 2. Image agreement white line license
3. Familiarity drawing
4. Visual complexity
6.3.1.4 Should I Have a Preview?

In Chapter 4, I discussed the linking hypothesis that underlies the joint use of
audio and visuals in visual world experiments. In a nutshell, the linking hypothesis
states it is the overlap between activated mental representations and the unfold-
ing speech stream that triggers an eye movement (Altmann & Kamide, 2007). To
establish a mental representation of the images, participants need to see them first.
The idea is that when participants look at an image, they form a mental repre-
sentation (an episodic memory), which will interact with the ensuing audio input
(Altmann & Kamide, 2007). The initial viewing of the display before the audio
begins to play is known as a preview.
In Tables S5.6–S5.12 (see online supplementary materials), I coded whether
or not the studies included in this book’s synthetic review had a preview at the
beginning of each trial. In general, most studies did. At the same time, a pre-
view was not relevant in the production substrand, either because researchers
wanted to capture eye movements early on in the trial (Flecken, 2011; Flecken
et al., 2015; Kaushanskaya & Marian, 2007) or because they were conducting a
face-to-face interaction study (McDonough, Crowther, Kielstra, & Trofimovich,
2015; McDonough,Trofimovich, Dao, & Dion, 2017). Other researchers used real
objects for participants to manipulate (Marian & Spivey, 2003a, 2003b; Sekerina
& Trueswell, 2011) and did not really specify in their papers whether and how
preview was done. One possibility is that preview occurred naturally when the
researchers were changing the display in between trials. Preview, then, is found
most commonly with comprehension-based visual world studies that are con-
ducted on a computer screen.
Dahan,Tanenhaus, and Salverda (2007) found L1 Dutch speakers showed effects
of phonological competition, but only when they were given a preview. When
participants did not know what the objects on the screen were ahead of time, looks
to the phonological competitor (e.g., koffer, “suitcase”, for the target koffie, “coffee”)
never exceeded looks to the distractor objects, suggesting koffer never competed for
word recognition in the trial. Similarly, Ferreira, Foucart, and Engelhardt (2013)
found a garden-path effect in L1 English speakers processing put the book on the
chair in the bucket only when there was a preview and there were a limited number
of objects in the display. The findings in these two studies may inform other visual
world studies about the importance of having an image preview. Researchers who
are interested in prediction—either phonological, morphosyntactic, semantic, or
multi-cue prediction—typically ensure participants know what all the images are,
before or at the very beginning of the audio, or precisely before the critical word.
Anticipation arises, or fails to arise, from the interplay of participants’ pre-activated
mental representations of the visual world and what they hear. Without knowing
what the images on the screen are, the participants’ task would be akin to visual
search, whereby they would be scanning a scene in search of a predefined target
object (see Sections 2.2 and 2.3). This is not how the visual world paradigm works.
Preview is important but so is preview time. Consistent with general trends

in visual world research (for a review, see Altmann, 2011b), the majority of L2
and bilingualism researchers opt for a 1- to 4-second preview. Studies in the
present synthetic review had a mean preview time of 2.4 seconds (Median: 2
seconds), with preview length ranging from 0.21 seconds (Mercier, Pivneva, &
Titone, 2014, 2016) to 5.5 seconds (Suzuki, 2017; Suzuki & DeKeyser, 2017).
Finding an appropriate preview time is a bit of an art. Just as with other aspects of
the experiment, piloting will inform the chosen preview length in the context of
your own experimental design and study population. When preview time is too
short, participants cannot adequately sample the visual scene, but when preview
is too long, participants may disengage from the scene (loss of interest) or start
forming hypotheses about how the images differ or are related and what this may
tell them about the overall experiment goal. To find an adequate preview time,
the best thing to do is to sit down in front of the computer screen and ask your-
self, “Does this feel right? Does this preview give me enough time to take in the
whole scene, yet not get bored or start thinking too much?”
Similarly to the preview/no preview question, there are no systematic com-
parisons of preview length in the literature summarized in Tables S5.6–S5.12.
Several researchers in psychology, however, have found that length of preview
could influence fixations on different objects in the display (e.g., Altmann &
Kamide, 2009; Dahan & Tanenhaus, 2005; Hintz, Meyer, & Huettig, 2017; Huettig
& McQueen, 2007; Kamide, Altmann, & Haywood, 2003). In a study with L1
Dutch speakers, Huettig and McQueen (2007) investigated in which timeframe(s)
participants retrieved different kinds of information (i.e., phonological, semantic,
and shape information) from a display. Given the target beker “beaker”, for exam-
ple, the three other images on the display were related to the target phonologi-
cally (i.e., bever “beaver”), semantically (i.e., vork “fork”), and in shape (i.e., klos
“bobbin”). Looks to the competitor(s) served as evidence that participants had
retrieved the corresponding information from the display. In Experiment 1, there
was no preview; the display appeared at the same time as the audio onset.Without
image preview, participants first looked at the phonological competitor before
the semantic and shape competitors. This progression mirrored the sequence of
events in speech recognition—from phonological representations to visual and
semantic information. In Experiment 2, participants were given a 200-ms preview
of the display. Given this brief preview, the participants directed their fixations to
the semantic and shape competitors, but no longer the phonological competitor.
In other words, the preview had enabled participants to activate their visual and
semantic representations of the objects and these now drove competition with the
target. Overall, the authors concluded that attention shifts were co-determined
by the time course of retrieval of information of these three kinds. Relating to
our present discussion, first and foremost, a preview is needed in most cases, and
second, the duration of preview deserves careful consideration. To this date, 1- to
4-second previews seem to be the default in our field. Researchers may vary
preview length as a function of when they assume a given type of information

becomes activated during speech processing (Huettig & McQueen, 2007). Studies
looking at phonological effects may require the shortest preview, followed by
semantic, morphosyntactic, and finally discourse-level investigations.
6.3.1.5 Should my Experiment Have a Fixation Cross?

Similarly to scene previews, researchers need to contemplate whether or not to
include a fixation cross in their experimental trials. A fixation cross is a marker (e.g.,
+) that precedes the main part of the trial (see Section 5.3, for a refresher of what
a trial is). This means a fixation cross normally comes before the part for which
eye-tracking data are collected. In its simplest form, a fixation cross is a small design
feature used to orient your participants’ attention to the task and let them know
a new trial is about to begin. Most researchers whose work is represented in this
book used a fixation cross this way, to signal the beginning of a new trial: see Tables
S5.6–S5.12 in online supplementary materials. As a fixation cross is normally placed
centrally on the screen, another potential advantage is that the cross equalizes the
distance participants need to travel to the different images on the screen. When all
participants start the experiment in the same location (i.e., with their eyes on the
cross), saccade length is controlled for. Compare this with a looking-while-listening
experiment, where the participant might already have her eyes on the target image
before the relevant part in the spoken input unfolds; such a scenario can complicate
the data analysis. The two primary advantages of having a fixation cross, then, are
that it adds structure to your trials and can provide better experimental control.
(The conditions for the second advantage to apply will be discussed in more detail
in what follows.) On the flip side, adding a fixation cross will take away from a
study’s ecological validity. With a fixation cross built into the experiment, research-
ers can no longer claim they are measuring natural viewing patterns of a scene.
Likewise, although a fixation cross can eliminate baseline effects on the surface, hid-
den biases (i.e., participant preferences for certain pictures) will still exist and may
manifest themselves at a later point in the trial.
The majority of visual world studies included in Tables S5.6–S5.12 (see online
supplementary materials) had a fixation cross as a part of their trial sequence.
About a quarter (7 out of 31 studies) did not have or did not report having a
fixation cross, and most of these were referential processing studies involving sen-
tence verification or interpretation (e.g., Kim, Montrul, & Yoon, 2015; Sekerina &
Sauermann, 2015). As described previously, there are good arguments to be made
for and against fixation crosses; however, the bottom line is you need to report
which option you chose in your study and why. Furthermore, when the goal is to
study prediction or anticipatory processing (see Section 4.2.2), it is more common
to include a fixation cross in the trials.
Next, with regard to the place of the cross within the larger trial sequence,
the present review suggests greater variability. Assuming participants are allowed
to preview the images (see Section 6.3.1.4), the fixation cross can come either
before or after preview. Most research studies in Tables S5.6–S5.12 (see online
supplementary materials) had a fixation cross before image preview, right at the
outset of each trial. An example is Dijkgraaf et al. (2017), shown in Figure 6.21,
who opted for a classic fixation cross – image preview – audio + image sequence.
When used in this manner, the fixation cross will capture participants’ attention;
however, because of the following preview, the eyes can be at any place on the
screen when the audio begins. In short, with the fixation cross before the preview
phase, initial fixation location and saccade length will no longer be controlled.
The other option is to insert a fixation cross in between the preview and the
audio phase. To signal the beginning of a new trial, researchers could use a beep
sound instead of a cross. Although this sequence is less common, it affords better
experimental control over participants’ eye gaze at a critical point in the trial (i.e.,
right before the audio begins to play). Perhaps, then, this is a better use of fixation
crosses in experimental design. Figure 6.22 depicts the corresponding experi-
mental stages, using Tremblay (2011) as an example. This study was previously
described in Section 5.3. From the researcher’s perspective, each trial consisted of
three stages: a word preview, a fixation cross on a blank screen, and then the words
and audio presented together. (For present purposes, the participant’s mouse click
at the end of the sequence is not depicted, but see Section 5.4.) Given such a
design, participants needed equal time to plan and execute a saccade to any four
of the candidate nouns upon hearing le fameux élan, “the infamous swing”. Finally,
FIGURE 6.21 Three-stage trial: (1) fixation cross, (2) image preview, (3) audio + image
processing with eye-movement recording.
(Source: Dijkgraaf et al., 2017).
FIGURE 6.22 Three-stage trial: (1) word preview, (2) fixation cross, (3) audio + word
processing with eye-movement recording.
(Source: Tremblay, 2011).
in the work by Hopp, the fixation cross was presented together with the images,
rather than on a blank screen, and the display remained the same throughout the
trial (Hopp, 2013, 2016; Lemmerth & Hopp, 2018). Even though the screen did
not change, each trial still consisted of three functionally distinct stages, much like
Tremblay (2011). After preview, a beep sound indicated participants had to fixate
on the central cross and audio did not begin to play until participants were fixat-
ing on the cross. In effect, this amounted to an image preview - fixation cross -
audio + image sequence.Textbox 6.5 summarizes the main points regarding other
design features in visual world experiments.

EXPERIMENT: PREVIEW AND FIXATION CROSS
1. Select a preview time (typically between 1 and 4 seconds) based on

what you want to study. Pilot the chosen length with a few participants
and make sure you report the chosen preview time in your article.
2. If you decide to use a fixation cross, it will be most effective if you insert
it after image preview. Make sure you describe your trial sequence in
your article.
6.3.2 Preparing Audio Materials

Careful selection of visuals goes a long way in creating a sound visual world
experiment. However, the visual world paradigm would not be complete without
audio accompanying the visuals. Hence, in this section, we turn our attention to
the second key component of visual world design, the auditory materials.
6.3.2.1 Creating Audio Materials

Visual world studies are premised on participants’ ability to recognize the spoken
input and map their linguistic representations of the input onto their mental rep-
resentations of the visual display. For this linking process to work, clarity of input
is key (for guidelines on the visual input, see Sections 6.3.1.2 and 6.3.1.3). The
visual world paradigm is not a test of participants’ listening skills; instead, materials
should be created that will enable fast and easy recognition. This point deserves
special emphasis in the context of L2 research, given that L2 listening comprehen-
sion will be somewhat slower and more effortful than L1 listening comprehen-
sion even under optimal conditions. For this reason, and especially when working
with L2 speakers, you want to create quality audio files that will suit your study
participants. To do so, researchers will typically enlist the help of someone with a
clear and pleasant voice, often a colleague in the department. After a few studies,
researchers will know whom to contact, but if this is their first time recording, the
researchers may invite a couple of candidate speakers to the recording room and
then retain the most suitable one. Another possibility, which I did not see in the
synthetic review, is to use one of the many automatic speech synthesis programs,
also known as text-to-speech programs, that are available on the internet. These
programs are easy to use, often free, and will keep your recordings free of coar-
ticulation4 and prosodic cues (e.g., stress, pitch, intonation) that may otherwise
bias your results. The flip side of using synthetic speech is that researchers need to
ensure the recordings sound natural.
Given that speech comprehensibility is of special concern when working
with L2 listeners, it is good to consider what accents your participants are most
exposed to. The default in visual world research is still to use native speakers of
the target language, unless the study itself is on foreign accent (e.g., Ju & Luce,
2004, 2006; Lagrou, Hartsuiker, & Duyck, 2013). Interestingly, Dijkgraaf et al.
(2017) used a native speaker of Dutch majoring in English and Dutch to record
sentences in both languages (i.e., English and Dutch). Given their shared linguis-
tic background with the speaker, the Dutch-English bilingual participants in the
study were familiar with the speaker’s accent in English. Dijkgraaf and her col-
leagues also collected accentedness ratings for the English speech to support their
claim that the speaker had “a clear pronunciation in Dutch and English” (p. 923),
also for monolingual English listeners (the other participant group). Reporting
some of this information on why and how you selected a given speaker enhances
research transparency and will improve your readers’ confidence in the quality of
your materials. Taking transparency up a level, researchers can also upload their
audio recordings to open repositories such as IRIS (https://www.iris-database.
org) or the Open Science Framework (https://osf.io/).
Once you have identified your speaker, the recording session is next. Ideally,
recording should be conducted in a soundproof or sound attenuating booth with
professional audio equipment. Having a dedicated recording space will let you
avoid echoes and unintended background noise. Because the speakers are gener-
ally volunteers, not professional actors, it will be good if they can get some train-
ing prior to the actual recording. First, the speaker should practice speaking in the
microphone. To record at a steady volume, the distance between the speaker and
the microphone should be kept constant as much as possible. It is very important
that the speaker speak with a neutral intonation. Prosody carries a lot of infor-
mation in speech, yet prosodic effects are seldom the topic of visual world research
(for an exception, see Sekerina & Trueswell, 2011). Therefore, to minimize the
influence of prosody, it is better that the speaker does not know much about the
experiment so he or she does not add unintended stress or intonation patterns to
the recording. The speech should also be appropriately paced. For L2 speak-
ers this may mean speaking at a somewhat slower rate. For instance, in Hopp and
Lemmerth (2018), sentences were recorded “at a slow-to-moderate pace with
neutral intonation” (p. 182). Similarly, Ito, Corley, and Pickering (2018) used a
slow speech rate of 1.3 syllables per second with pauses between the phrases in
their sentences in order to “create optimal conditions for predictive eye move-
ments” (p. 253). The benefits of a slower speech rate for prediction need to be
weighed against the risk of inducing strategies in your participants; that is, if the
speech becomes too slow, participants may begin to realize what the experiment
is about and adjust their behavior. Finding the right speech rate, then, is a balance
between what sounds natural and yet, leaves participants time to fully deploy their
predictive abilities, if they have any.
Before you start recording, allow sufficient time for the speaker to familiarize
him- or herself with the materials. Using mono mode (as opposed to stereo) for
the recording typically results in a clearer voice which can be delivered through
both sides of the earphones equally. It is wise to keep a record of the recording
parameters, including the software used, the sample rate, and so on. This informa-
tion can be included later when you write up the methodology for publication.
For the recording itself, my best advice is to get it right the first time. Getting
it right is easier said than done of course, but if you succeed, you can avoid mix-
ing items from different recordings. Participants will be able to tell if you mix
items and so it might be better to redo the recording completely if you have to.
To increase their chances of success, many researchers will make three record-
ings of their sentences in a session. Consider recording the stimulus list from top
to bottom three times, rather than recording the same sentence three times in a
row. Stress patterns tend to carry over between successive recordings and so if you
cycle through the whole stimulus list first, there is a greater chance that the second
or third versions will not have the same issue as the first. Make sure you remind
your speaker periodically to keep talking at the same pace. Check the audio on
the spot during the recording session if you can. Finally, like your participants,
your speakers will get fatigued faster than you might think. Give them breaks and
treat them to a nice cup of coffee.
The final step in preparing the audio files is the editing. There is a lot of
audio editing software available for use free of charge. Two common software
programs are Audacity (Audacity Team, 2012) and Praat (Boersma & Weenink,
2018). Using such a program, you first want to normalize all the files, meaning
you adjust the volume of all the audio files to a similar level. Then, listen to each
file very carefully and pick the recording, for each sentence, that sounds clear
and neutral. Consider adjusting the pitch of certain parts of the audio if there is
any unintended stress. Length is another thing you could manipulate with the
software (see Section 6.3.2.2). Save the files in a format that is compatible with
the experimental software you will use. Make sure the file names are meaning-
ful to you. Using the item number is generally a good idea, and so is including
the critical word and/or the condition (e.g., 01_Constr_Reads_Book.wma or
06_graben_3sg.wav). Having this information in the file name will save you a lot
of time when you program your experiment.
6.3.2.2 Defining Time Periods

In many visual world studies, researchers seek to measure eye fixations during one
or more time windows. Time windows are the periods for which researchers
will extract and analyze the eye-movement data they recorded. In other words,
eye movements are recorded continuously in the audio stage, but researchers are
typically going to focus on only a subset of all the data (see Section 8.5). The
spatial subset is defined by interest areas (see Section 6.1) and the temporal subset
is defined by time windows or time periods. As is the case with interest areas (see
Section 6.1), there is often more than one way to delimit time windows for a
study. To determine time windows for your study, you will want to think about
the auditory input and how different parts in the input relate to your research
questions.
Time windows are defined by a starting point (onset) and an end point (off-
set).The end point can be predetermined by the researcher or it can coincide with
the end of the trial, for instance when participants click on the mouse to select an
object (examples provided in what follows). In a prediction study, the time period
will typically be the window during which participants can predict the upcom-
ing target referent. I will refer to this as the prediction window. The prediction
window will typically range from the offset of the predictive cue (because listen-
ers need this cue to make a prediction) to the onset of the target referent (because
now there is no room for prediction anymore). Some researchers will adjust that
time window for the time it takes to plan and execute a saccade. In that case, the
temporal region of interest will be shifted rightward by 200 ms—from the offset
of the predictive cue + 200 ms to the onset of the target referent + 200 ms—
because 200 ms is how long it takes to plan and launch a language-mediated eye
movement (e.g., Matin, Shao, & Boff, 1993; Saslow, 1967). In contrast to the single
prediction windows found in most prediction studies, some researchers who study
referential processing use multiple time windows, representing the different key
constituents in the sentence. For instance, Sekerina and Sauerman (2015), in a
study on the interpretation of every in Russian and English, had four time win-
dows: the sentence subject with every, the verb, a locative prepositional phrase, and
a silence at the end of the sentence (see Section 4.2.3).
Once you have determined appropriate time periods for your own project,
the next step is to determine when exactly these time points occur in the audio
files. This is where you go back to your speech editing software to listen to your
audio recordings again (see Section 6.3.2.2). Listen carefully to each sentence.
When do the beginning and end of each time period occur? You will want to
mark the onset and offset latencies (in ms) relative to the beginning of the trial.
Figure 6.23 demonstrates how this works for the sentence Every alligator lies in a
bathtub, from Sekerina and Sauermann (2015). As previously mentioned, Sekerina
and Sauermann analyzed eye fixations in four time periods: every alligator — lies —
in a bathtub — (silence). For illustration purposes, my research assistant made a
new recording of the English sentence. I have presented it as Figure 6.23 in what
follows. In the recording, the four time periods had the following onset and offset
times: every alligator (0–1290 ms), lies (1290–2010 ms), in a bathtub (2010–3084
ms), silence (3084–end of trial). Researchers could now add this information
(i.e., 1290, 2010, and 3084 ms, start and end of trial) as time stamps in the
programming software of their eye-tracking experiment. With the help of these
FIGURE 6.23 Visualization of the spoken sentence Every alligator lies in a bathtub in

Audacity. Vertical lines mark the onsets and offsets of the four time
periods.
time stamps, they will be able to extract the eye-fixation data for each of the four
time windows separately and ignore all data that was recorded outside this time
window. This will greatly facilitate the analysis.
Like many other researchers, Sekerina and Sauermann used a number of
different sentences to test their research hypotheses. Therefore, the research-
ers had to measure onset and offset times for the critical time periods in each
sentence separately. Other studies may have a carrier phrase, an introductory
phrase that is identical across all the experimental items. Examples of English
carrier phrases are Pick up the …, Click on the …, Look at the …, and Find the
… . These phrases derive their name from the fact that they carry, or provide
a frame for, the following object, which is the target referent. If the researchers
use the same carrier phrase throughout the experiment, they can control for its
acoustic properties and will need to measure the onset of the first time window
only once. Specifically, the onset of the first time window will coincide with the
end of the carrier phrase. An example of this approach is Hopp’s (2013) study on
gender-based prediction in German. Participants listened to sentences such as
Wo ist der / die / das gelbe [Noun], “Where is theMASC./FEM./NEUT. yellow [Noun]?”
and clicked on the corresponding objects on the screen. To align onset times
across different sentences, Hopp took the carrier phrase Wo ist, the gender-
marked article, the adjective, and the noun, and put them together in a newly
formed audio file. The carrier phrase was exactly 1,103 ms long in each file.
This technique, which is known as splicing, is a way to control for the length
of different parts in the sentence and make sure time periods start at the same
point across different trials. With a user-friendly software, splicing should be as
easy as the copy-and-paste operation (see Figure 6.24); however, it is important
to check the naturalness of the spliced stimuli.
Other researchers do not splice their sentences but hand-edit the length of the
different constituents. For instance, Morales et al. (2016), in a study on gender-
based prediction in L2 Spanish, hand-edited the length of the carrier and the
definite article so they would be the same across all trials. Specifically, the carrier
encuentra, “find”, was edited to 800 ms, the definite articles el and la were 147
ms, and they were followed by 50 ms of silence before the target noun onset.
With these precise time points, the researchers were able to link eye fixations at
any point during the trial to the exact input a participant heard at that time (see
Figure 6.25). That is, any eye fixations to the left of the Y axis (i.e., negative times,
not shown in Figure 6.25) would have been for the carrier phrase encuentra. This
is where baseline effects in picture preferences would occur, if there are any, so it
could be informative to plot that data as well (see Osterhout, McLaughlin, Kim,
Greewald, & Inoue, 2004; Steinhauer & Drury, 2012, for similar arguments for
event-related potentials). The article el or la plus silence covered 0–197 ms, and
the target noun came right after that. Accordingly, the authors’ temporal region of
interest extended from 200 ms post article onset (to account for saccade latency)
to 900 ms, when participants clicked on the corresponding target noun.
FIGURE 6.24 Splicing the target noun onto the carrier phrase.
FIGURE 6.25
Eye fixation patterns plotted against time. The Y axis and the black
vertical line mark the onsets of the two critical time periods in the study.
Note: shaded area represents significant differences between conditions,
based on Morales et al.’s (2016) analyses.
(Source: Morales et al., 2016, graph modified with permission from the author).
In sum, visual world researchers have a few different options to make their
audio files ready for use. Whether they edit their files by hand or choose to splice
them, ideally the different parts within a sentence (e.g., lead-in or carrier phrase—
predictive cue—target noun) should coincide across the different audio files. For
instance, if 300 ms marks the onset of a critical time window in a sentence, it
is best if it does so consistently across all sentences. Aligning your time periods
in this manner will produce cleaner data and facilitate data analysis. What those
eye-movement data are and how you can begin analyzing them will be the topic
of Chapters 7 and 8. Textbox 6.6 summarizes the key points for creating audio
materials in eye-tracking experiments.

EXPERIMENT: AUDIO
1. Choose a speaker with a clear and pleasant voice. Have the speaker
practice talking at an appropriate, slow-to-moderate pace, with neutral
prosody. Use a soundproof or sound-attenuated booth and professional
recording equipment when possible. Jot down important parameters
(e.g., sample rate) for later reporting.
2. Edit your sound files using audio editing software. Normalize volume
across all files. Hand-edit the length of the different components if
necessary.
3. Set time periods in a manner that will let you address your research
questions. Visualize the speech stream in the audio editing software to
identify precise time points. Use these time points to align the segments
in your different sentences.
6.4 Conclusion
This chapter has provided detailed, practical guidelines for you as a researcher to
design your eye-tracking study. These guidelines should be read in tandem with
the general methodological guidelines provided in Chapter 5, as eye-tracking
research follows the same principles as other types of experimental research. A
key element in study design is setting interest areas for your research project (see
Section 6.1). Interest areas come in many shapes and forms, reflecting the diversity
of eye-tracking applications in our field.Text-based and visual-world eye-tracking
researchers have used four types of interest areas, namely word-based interest areas,
larger areas of text, images, and dynamic (moving) interest areas.To define interest
areas for your study, you will want to consider your research questions, your mate-
rials, and the spatial accuracy and precision of your eye-tracking system.
Next, we turned to the specifics of text-based and visual-world eye-tracking
design. Eye-tracker properties and the characteristics of the human eye will
impose spatial and artistic constraints on text presentation, such as how large
the text should be and what the minimal region of analysis should be (see
Sections 6.2.1 and 6.2.2). Likewise, linguistic factors related to how the human
mind processes language require experimental control. It is seldom a good
idea to put some material onscreen and simply look at what happens. A better
approach is to create different versions of the same material and relate dif-
ferences in eye movements to the changes you made. Factors that cannot be
controlled for in the experimental design should be controlled statistically (see
Section 6.2.3). The highlights of text-based eye-tracking design were summa-
rized in Textbox 6.2.
In visual world research, as in text-based eye tracking, a careful research design
is paramount to account for all the factors that could potentially influence a
participant’s eye gaze during the study (see Section 6.3.1.1). Careful experimen-
tal design enables researchers to control potential confounds resulting from vis-
ual (e.g., colors) and linguistic (e.g., image names) features of the materials (see
Sections 6.3.1.2 and 6.3.1.3). Normed databases are a useful resource for selecting
suitable visuals (see Table 6.2). For audio, the emphasis is on making clear, qual-
ity recordings that will be easy to comprehend (see Section 6.3.2.1). Audio files
come with their own regions of interest, called time periods or time windows (see
Section 6.3.2.2). Researchers may need to edit their recordings a bit to ensure
that time periods start at the same time across their different sentences. The key
considerations for creating images and audio were summarized in Textboxes 6.4,
6.5, and 6.6.
At the end of the day, designing a sound experiment is not unlike gardening.
Careful planning up front will ensure that you get the most fruitful results down
the road. Even if you find yourself down in the weeds at some point, it is impor-
tant to keep sight of the coming seasons in your research cycle. Quality research
data is the best harvest! For that, make sure you know the properties of your
materials and take the time to design them carefully. Always run a pilot first. In
a well-designed study, different design features will tie in nicely with the overall
goals and research questions of a study.
Notes
1 Exceptions include some kinds of writing research, face to face interaction, and areas of
computer-assisted language learning where participants are interacting with software
programs.
2 The current approach relies on null hypothesis significance testing (NHST), which is
the default statistical approach in SLA and bilingualism, but which poses some con-
ceptual difficulties for the interpretation of null results. Alternatives to NHST that can
also be used for matching purposes are equivalence tests (Godfroid & Spino, 2015) and
Bayesian statistics (Dienes, 2014).
3 This section has benefited from a long discussion about the visual world paradigm with
Geri Altmann. I thank Professor Altmann for his input and suggestions.
4 Coarticulation is the change in articulation of one sound because of neighboring
sounds. For example, a word-initial vowel is typically linked to the offset consonant of
the preceding word.
7
EYE-TRACKING MEASURES
Eye-movement measures define how researchers will look at their data. They are
a way to carve up the large amount of information in an eye-movement record,
by allowing the researcher to focus on particular events, such as fixations, saccades,
or a combination of fixations and saccades, and measure particular properties of
these events. Eye-movement measures are typically extracted for specific regions
on the screen, termed interest areas (see Section 6.1) and may be categorized
in terms of their temporal properties (when the event happened relative to other
events in and out of the same interest area). Eye-movement measures function as
a dependent variable in most statistical analyses, but they have also been used as an
independent variable to study the relationship between online processing patterns
and learning (e.g., Cintrón-Valentín & Ellis, 2015; Godfroid et al., 2018; Godfroid,
Boers, & Housen, 2013; Godfroid & Uggen, 2013; Indrarathne & Kormos, 2017;
Mohamed, 2018; Montero Perez, Peters, & DeSmet, 2015; Pellicer-Sánchez, 2016;
Winke, 2013).
Although eye-movement measures can be calculated by the eye-tracking soft-
ware only after the data are collected, it is a good idea to think ahead about what
measures you will use in your study. This chapter can prepare you to do just that.
It provides a comprehensive overview of the measures that have been used in
SLA and bilingualism research to date, drawing on the substantive review of eye-
tracking literature in Chapters 3 and 4. When deciding what measures to include,
it certainly helps to know the most commonly used eye-tracking measures in the
field. Additionally, researchers may wish to familiarize themselves with measures
that are perhaps less common but typical of their particular subfields. I hope that
the breadth of measures reviewed in this chapter will entice researchers to sample
widely from across the spectrum of eye-tracking measures. In fact, most research-
ers do already include multiple eye-movement measures in their analyses but they
206 Eye-Tracking Measures
still rely heavily on measures of eye fixation duration.This chapter is an invitation

to look beyond this fixed set of duration measures. Lastly, eye-tracking research
and data analysis also have room for innovation. To showcase ongoing develop-
ments, the present chapter includes several examples of researchers who have sup-
plemented existing measures with their own, custom-made variables.
In Section 7.1, I will provide a broad overview of the different measures that
have been used in contemporary eye-tracking research in SLA and bilingualism. I
have categorized the different measures into three superordinate categories—fix-
ations, regressions, and eye-movement patterns—and four subtypes for the
fixation-based measures. Each of these categories, and the measures they subsume,
will be discussed in detail in Section 7.2: (1) fixation counts, probabilities, and
proportions (Section 7.2.1.1), (2) fixation duration (Section 7.2.1.2), (3) fixation
latency (Section 7.2.1.3), and (4) fixation location (Section 7.2.1.4), followed by
(5) regressions (Section 7.2.2), and (6) integrated measures or eye-movement pat-
terns (Section 7.2.3). Extensive examples from both text-based and visual world
eye tracking will illustrate how researchers use and interpret these measures. To
conclude, I will revisit the question of what measures, of the many options avail-
able, researchers could select for their particular studies (Section 7.3).
7.1 Eye-Tracking Measures in Text-Based

and Visual World Research
The synthetic review of eye-tracking literature yielded a total of 52 text-based
studies and 32 visual world studies published in SLA and bilingualism (for infor-
mation on inclusion criteria, see Sections 3.2 and 4.2). With the help of two
research assistants, I identified which eye-movement measures were analyzed in
each study and whether the analyses revealed significant or non-significant dif-
ferences for a given measure. The two tree diagrams following summarize all the
eye-tracking measures that were attested across this body of work. Figure 7.1
captures the analytic diversity in text-based eye-tracking research. Combined, the
52 text-based studies used as many as 25 different eye-tracking measures, span-
ning six different categories (fixation counts/probabilities/proportions, fixation
durations, fixation latency, fixation location, regressions and saccades, and eye-
movement patterns). Figure 7.2 presents the comparatively smaller set of eye-
movement measures used in visual world research.Visual world researchers appear
more homogeneous in their choice of dependent variables. Across 32 visual world
studies, they analyzed a total of four different eye-movement measures, which
came from three main categories (fixation counts/probabilities/proportions, fixa-
tion durations, and fixation latency). When it comes to eye-tracking measures,
then, it appears that text-based researchers have enjoyed more analytic flexibility,
at least in choosing what variables to analyze. In visual world research, on the
other hand, analytic flexibility is greater at the level of the statistical analysis. As
we will see in Chapter 8, visual world researchers have adopted many different
FIGURE 7.1 Taxonomy of eye-tracking measures used in text-based studies.
FIGURE 7.2 Taxonomy of eye-tracking measures used in visual world studies.
statistical techniques to analyze their eye-tracking data, which differ considerably

in their levels of sophistication (see Sections 8.3 and 8.5).
Given so many options to choose from (and the additional possibility to cre-
ate your own, custom-made measures), which measures should be prioritized in
a study? A basic principle in eye-tracking research is that it is better to include
a range of different measures in order to obtain a more complete picture of
participants’ viewing behavior. In our field, eye-tracking researchers in a single
study have analyzed an average of 3.37 measures (SD = 1.60, range = 1–6) in the
case of text-based studies and 1.38 measures (SD = 0.49, range = 1–2) in visual
world research. The advantages of including multiple dependent variables are that
the different measures could provide converging evidence for or against your
hypothesis or, in rarer cases, could reveal that an effect only manifests itself at some
points during processing (e.g., only in later stages with L2 speakers or only when
readers initially encounter a word or phrase). A potential drawback to analyzing
multiple eye-tracking measures is that these measures may be correlated, for
instance because one measure is a subset of the other measure.When this happens,
the resulting statistical tests will no longer be independent and the false positive
rate (i.e., finding a significant difference that is not a true effect) will increase (Von
der Malsburg & Angele, 2017). To safeguard the independence of the statistical
tests, researchers should sample a range of measures that do not overlap in their
temporal properties (for an example, see Figures 7.20 and 7.21).This will become
clearer as you learn about the different measures; however, the key idea is to pick
your measures like puppets for a puppet show, rather than Russian nesting dolls—
you want them to be distinctly different.
Eye-Tracking Measures 209
Figure 7.3 gives an overview of the distribution of eye-tracking measures

in text-based and visual world eye-tracking research. The graph focuses on
broad categories of eye-movement measures, which describe different proper-
ties of fixations and (regressive) saccades, the two main events in eye-move-
ment data (see Section 2.2). In text-based research, nearly all authors (94%)
analyze at least one fixation duration measure in their studies. The synthetic
review revealed 12 different fixation duration measures in the SLA and bilin-
gualism literature (see Figure 7.1), which makes fixation duration not only
the most frequent but also the largest category of measures in text-based eye-
tracking research.
In the visual world paradigm, fixation counts and derived measures such as
fixation proportions are the most common dependent variable.They are reported
in 63% of all visual world studies, compared to 38% of text-based research. Other
variables are more paradigm specific; for instance, fixation latency is analyzed in
visual world research (28% of studies) but hardly ever in text-based research (4%
of studies). Conversely, one-fifth of text-based eye-tracking studies (21%) ana-
lyze regressions, which are a reading-specific type of eye-movement behavior.
In recent years, text-based eye-tracking studies have also seen the emergence of
integrated measures in the form of eye-movement patterns (6%). In the follow-
ing sections, the different measures that fall under each of these categories will be
introduced one by one. At the beginning of each section, a pie chart will sum-
marize which specific measures have been the most common (see Figures 7.5, 7.6,
7.11, and 7.13).This will help you develop a sense of common and less commonly
used measures in the field.
FIGURE 7.3
Types of eye-tracking measures used in text-based and visual world
studies.
7.2 Eye-Movement Measures
7.2.1 Fixations and Skips
7.2.1.1 Counts, Probabilities, and Proportions
To illustrate many of the primary eye-movement measures found in SLA and
bilingualism, I will use two data trials from a novel-reading study (Godfroid et al.,
2018).You will want to refer back to Figure 7.4 as you learn about each measure.
In Godfroid et al. (2018), L1 and L2 English speakers read five chapters of the
authentic English novel A Thousand Splendid Suns (Hosseini, 2007). The novel is
set in Afghanistan and contains a number of Dari words (the Farsi dialect spoken
in Afghanistan) to convey the foreign setting of the novel. Figure 7.4 depicts two
participants’ responses to an unfamiliar word, tahamul, that occurred in the novel.
Figure 7.5 and Table 7.1 present a summary of the different count measures
in SLA and bilingualism research, illustrated with data from Figure 7.4. Fixation
FIGURE 7.4 Two different reading patterns for an unfamiliar word, tahamul, embedded
in context. Note: this sentence was part of an exchange between the
protagonist Mariam, a 15-year-old girl, her mother, and a tutor about
whether Mariam would be allowed to go to school. Mariam’s mother is
opposed to the idea. She previously uttered the sentence, “There is only
one, only one skill a woman like you and me needs in life, and they don’t
teach it in school. Look at me.” (Hosseini, 2007, p. 17). As seen in the
figure, that skill is tahamul—the Dari word for endure.
FIGURE 7.5 Count measures in eye tracking in SLA and bilingualism (used in 20 out
of 52 studies with text).
TABLE 7.1 Definitions and examples of count measures
Measure Definition Example (from Figure 7.4a)

Fixation count The total number of fixations 4 [fixations 6, 10, 11, 12]
made in an interest area
Consecutive fixation The number of fixations made First pass: 1 [fixation 6]
count during a single visit to an Second pass: 3 [fixations
interest area 10, 11, 12]
Visit count The total number of visits made 2 (as seen in the row above,
to an interest area there were two passes)
Skip count The total number of times an 0
interest area was skipped
counts and the measures we derive from them (i.e., probabilities and proportions)
have informed a variety of questions in our field. Count measures are useful
when the region of analysis is a larger area on the screen, as in reading assess-
ment (Bax, 2013; McCray & Brunfaut, 2018), subtitle processing research (Bisson,
Van Heuven, Conklin, & Tunney, 2014; Muñoz, 2017), and, importantly, in visual
world research. For example, Bisson et al. (2014) used consecutive fixation
counts in an attempt to distinguish participants’ reading behavior for different
types of subtitles (native and foreign language subtitles). They found no differ-
ences in the number of fixations across subtitle conditions; instead, they found
a rather regular reading pattern of the subtitles even when the subtitles were
in a language that the participants did not know (see Section 3.2.4). Analyses
of fixation counts are also common in various types of sentence-processing
research with lexical interest areas (Carrol & Conklin, 2015; Pellicer-Sánchez,
2016; Philipp & Huestegge, 2015; Siyanova-Chanturia, Conklin, & Schmitt,
2011; Sonbul, 2015) or grammatical interest areas (Philipp & Huestegge, 2015;
Winke, 2013). In this case, the analysis of fixation counts supplements the analysis
of fixation duration measures. Analyses of fixation counts and fixation durations
often provide converging evidence: effects tend to be present in both counts and
durations or absent from both (but see Philipp & Huestegge, 2015; Van Assche,
Duyck, & Brysbaert, 2013). The reason is fixation counts correlate with fixation
durations (Godfroid, 2012). As people fixate more often in the same area, aggre-
gate duration measures for that area will increase.
Measuring fixations, and the lack thereof, is the bread and butter of visual world
researchers, who analyze fixation proportions more than any other dependent
variable (see Figure 7.3). Did the participant look at a picture at a given moment
in time, yes or no? This is what the majority of visual world researchers record and
analyze in a raw or aggregate form.The analysis of fixation proportions is common
in prediction research, from word-level processing (e.g., Marian & Spivey, 2003a,
2003b; Mercier, Pivneva, & Titone, 2014, 2016; Tremblay, 2011), to morphosyn-
tax (e.g., Mitsugi, 2017; Morales et al., 2016; Suzuki, 2017; Trenkic, Mirković, &
Altmann, 2014) and semantics (Dijkgraaf, Hartsuiker, & Duyck, 2017; Kohlstedt &
Mani, 2018), all the way up to research at the discourse-syntax-prosody interface
(Sekerina & Trueswell, 2011). Fixation proportion has also been used in a study
on the interpretation of overt subject pronouns (Cunnings, Fotiadou, & Tsimpli,
2017). Researchers want to know for what proportion of participants and trials,
the eye tracker recorded a fixation on a given image. Many also want to know
how this proportion of looks changes over time, for different groups of partici-
pants, and different experimental conditions (see Section 8.5 for further details).
When a study includes both L1 and L2 (bilingual) speakers, the focus is often on
whether L2 speakers or bilinguals exhibit the same type of predictive processing
or interpretive preferences as L1 speakers do (e.g., Cunnings et al., 2017; Dijkgraaf
et al., 2017; Mitsugi, 2017; Sekerina & Trueswell, 2011; Tremblay, 2011; Trenkic
et al., 2014). For instance, Mitsugi (2017) compared L1 and L2 Japanese use of
case markings to anticipate the voice (i.e., active or passive) of the verb (Japanese
is a verb-final language). Participants saw two scenes on the screen, for instance a
woman hitting a man (active) and a woman hit by a man (passive). Mitsugi (2017)
found the L1 Japanese speakers increasingly looked to the correct scene in the
1200 ms following the case-marked nouns and before the verb, suggesting these
participants could use the case markings predictively. The college-level L2 learn-
ers, on the other hand, did not show similar anticipatory behavior (see Section
4.2.2.3). Lastly, the analysis of fixations also has a place in face-to-face interaction
research (McDonough, Crowther, Kielstra, & Trofimovich, 2015; McDonough,
Trofimovich, Dao, & Dion, 2017). McDonough et al. (2015), for instance, created
a new variable, which they termed mutual eye gaze. A mutual eye gaze occurred
during interaction when the L1 and L2 speaker looked at each other during
a feedback episode. The researchers found that the L2 speaker was more likely
to reformulate her initial utterance correctly when feedback coincided with a
mutual eye gaze (see Section 4.2.4).
The previous review highlights the many ways in which fixations can be ana-
lyzed, namely, as binary, yes-or-no events (e.g., McDonough et al., 2015, 2017),
counts (e.g., Bisson et al., 2014), proportions (e.g., Mitsugi, 2017), and probabili-
ties (e.g., Marian & Spivey, 2003a, 2003b). Binary fixation data and count data
are available directly from the eye tracker after the event detection algorithm
has processed the raw data (for further details on data preprocessing, see Sections
8.1 and 9.1.3). The binary data can be converted easily into proportions (0–1)
or probabilities (%). To convert binary fixation data into proportions, one simply
divides the number of trials where an event occurred by the overall number of
trials. This will give a proportion, which is a number between 0 and 1. To express
as a percentage, one multiplies the number by 100 (for a detailed description, see
Section 8.5.2.1). These variables are useful in visual world research, where many
studies revolve around the analysis of fixation proportions or probabilities. For
instance, in Marian and Spivey’s (2003a) study, Russian-English bilinguals looked
at the between-language competitor (e.g., spichki, “matches”, given the target
word speaker) in 15% of all trials. This percentage was derived from a total of 20
critical trials per participant and each trial was coded individually for whether or
not the participant looked at the between-language competitor.
The previous discussion has focused on how to calculate proportions and
probabilities for eye fixations. The same principles apply to other binary events in
eye-tracking data, such as skips and regressions. Here, we focus on skips, which are
the opposite of fixations, or “zero fixations” (for an overview of regressions, see
Section 7.2.2). Looking at something or not looking at it (i.e., skipping it) are the
only ways in which participants can engage visually with information. Therefore,
skipping probabilities and fixation probabilities are each other’s inverse. If the
fixation probability for a given condition is, say, 68%, then the skipping prob-
ability is 32%. In a study on language switching, Philipp and Huestegge (2015)
reported skipping probabilities for the first word of a sentence in two-sentence
trials. The researchers found that L1 German–L2 English speakers were some-
what less likely to skip the first word in the sentence of a language switch trial
(i.e., German–English or English–German) than a language repetition trial (i.e.,
German–German or English–English). They argued this finding showed their
participants adopted a “careful reading strategy” (p. 662) following a language
switch, which was relatively short-lived and distinct from any longer-term effects
on sentence comprehension.
In summary, count measures are eye-movement measures that tell us how often
something occurred. The eye-tracking software can count different things—fixa-
tions, skips, visits, and regressions. Researchers can report these measures as raw
counts, binary yes/no events, probabilities (%), or proportions (0–1). Following
Radach and Kennedy (2004), we can think of counts as spatial eye-movement
measures that pertain to a given area on the screen and are distinct from tem-
poral eye-movement measures, which are discussed next. Because fixation
counts subsume all activity over all visits or passes in a given area, they are a
late eye-movement measure (see Section 7.2.1.2.2, for more information on the
early/late distinction). Fixation counts are also an aggregate measure, similarly to
total reading time. Conversely, skipping probability is best conceived as an early
measure (Conklin & Pellicer-Sánchez, 2016) because the decision to skip a word
must be made when the word is still in parafoveal vision; that is, before the eyes
have landed on it.
7.2.1.2 Fixation Duration
Duration measures are the largest category of dependent variables in SLA and
bilingualism research.The synthetic review of eye-tracking literature from 15 SLA
journals and Language Testing (see Chapter 3) returned 12 different measures of
FIGURE 7.6
The ‘big four’ durational measures (upper panel) and other durational
measures (lower panel) in eye tracking in SLA and bilingualism (used in
49 out of 52 studies with text). Note: The numbers in the lower panel do
not add up to 16% due to rounding.
TABLE 7.2 Definitions and examples of duration measures
Measure Definition Example (from Figure 7.7a,

unless otherwise noted)
First fixation The duration of the first fixation [6]
duration made in an interest area
Single fixation The duration of the first fixation made
n/a
duration in an interest area when that interest
(there is more than one
area receives exactly one fixation fixation)
Refixation The difference between gaze In Figure 7.7a: 0
duration duration (or first pass reading In Figure 7.7b: [7] + [8]
time) and first fixation duration (remember, [6] is first fixation
duration)
Gaze duration The sum of all fixations recorded for In Figure 7.7a: [6]
a single-word interest area up to In Figure 7.7b: [6] + [7] + [8]
the point when the eyes leave the
interest area
First pass The sum of all fixations recorded for See gaze duration
reading time a multi-word interest area up to (the interest area is one word)
the point when the eyes leave the
interest area
First subgaze The sum of all non-final fixations n/a
before a participant makes an (there was no overt
overt response, such as pressing a response required in this
button experiment)
Regression path The sum of all the fixations made [6] + [7] + [8] + [9] + [10] +
duration from the first entry in an interest [11] + [12]
(Go-past time) area until the eyes exit the area to (even though [7], [8], and [9]
the right, including any fixations are fixations outside the
made following a regression to a interest area, they follow
previous part of the sentence or regressions out of the
text interest area and are thus
included in the overall
processing time)
Second pass The summed duration of all fixations [10] + [11] + [12]
time that are made within an interest
area when the eyes visit the area
for the second time; This includes
cases where the interest area was
originally skipped
Rereading time The summed duration of all fixations [10] + [11] + [12]
in an interest area except for those
fixations made during first pass
Last fixation The duration of the last fixation n/a
duration before a participant makes an (there is no overt response)
overt response
(Continued)
TABLE 7.2 Continued
Measure Definition Example (from Figure 7.7a,

unless otherwise noted)
Total time The total sum of all fixation durations [6] + [10] + [11] + [12]
recorded for an interest area
Total visit The summed duration of all visits to [6] + [10] + [10 → 11
duration a particular interest area saccade duration] + [11]
+ [11 → 12 saccade
duration] + [12]
Expected The expected time a participant will Syllable-based expected
fixation spend in a given interest area if she fixation duration:
duration distributes her attention evenly Sentence: 12 syllables
across all the interest areas on the Tahamul: 3 syllables (25%) ([1]
screen + [2] + [3] + [4] + [5] +
[6] + [7] + [8] + [9] + [10]
+ [11] + [12] + [13]) ÷ 4
Observed Fixation duration in an interest area [6] + [10] + [11] + [12]
fixation as recorded by the eye tracker
duration
Difference The extent to which a participant’s (([6] + [10] + [11] + [12])
between processing time in a given -
observed interest area deviates from what is ([1] + [2] + [3] + [4] + [5]
and expected expected under an equal-attention + [6] + [7] + [8] + [9] +
fixation assumption [10] + [11] + [12] + [13]))
duration ÷4
(ΔOE)
Note: Measures are represented as a period of time, typically in ms, that corresponds to the length of
the individual fixations.
fixation duration (see Figure 7.6 and Table 7.2). Some of these, such as first fixa-
tion duration, gaze duration, regression path duration, and total time, are now
standard in the text-based eye-tracking literature (see Figure 7.6, top panel).
Other measures, such as refixation duration, second pass time, and rereading time,
are related but have not been reported quite as frequently, perhaps due to the
statistical properties of these variables (i.e., they are often zero). Finally, the area of
durational eye-movement measures has also enjoyed its share of innovation, with
individual researchers and research teams developing new measures (e.g., first sub-
gaze, last fixation duration, delta [Δ] total time) to respond to their research needs
(Hoversten & Traxler, 2016; Indrarathne & Kormos, 2017, 2018; Miwa, Dijkstra,
Bolger, & Baayen, 2014).
7.2.1.2.1 Early versus late eye-movement measures

Of the different kinds of eye-movement measures, temporal or durational meas-
ures lend themselves most readily to being categorized as early or late. Early and
in context.
late measures roughly coincide with the initial visit and any subsequent visits to
the region of interest, respectively. In Figure 7.1, early measures and late meas-
ures are represented as two separate branches in the fixation duration category.
The early measures (i.e., first fixation duration, single fixation duration, refixation
duration, gaze duration or first-pass reading time, first subgaze, and regression-
path duration or go-past time) may index “processes that occur in the initial
stages of sentence processing” (Clifton, Staub, & Rayner, 2007, p. 349), such as
word recognition or lexical access.The other measures (e.g., second pass time,
rereading time, last fixation duration, total time, total visit duration) signal com-
paratively late stages of processing and may signal an interruption to the normal
reading process. Although these characteristics hold true in general, exactly how
the distinction between early and late measures plays out in language research will
depend on the research area and the topic under investigation (see examples fol-
lowing). The coding work that went into the current synthetic review revealed it
is much more common for an effect to show up in multiple measures (i.e., both
early and late). When this happens, researchers may feel more confident in their
findings because different measures essentially provide converging evidence that
an effect is real. Though it is rarer, an effect showing up in one (set of) temporal
measure(s) but not in the others could be theoretically more interesting, provided
the finding can be replicated. In what follows, I will present two exemplary studies
from different research areas to illustrate this point. In each case, the authors used
their findings to make specific theoretical claims—about the nature of L1 and L2
parsing (Felser, Cunnings, Batterham, & Clahsen, 2012) or about the time course
of lexical activation (Taylor & Perfetti, 2016).
Felser et al. (2012) investigated how L1 and L2 English speakers compute wh
dependencies (filler-gap dependencies for relative clauses) during reading (also see
Section 3.2.1). Participants read complex English sentences with relative clauses.
In half the conditions, the sentences contained an additional relative clause (dou-
ble embedding), as in (1c) and (1d). These doubly embedded relative clauses are
“extraction islands” (p. 67), meaning the relative pronoun of the first relative clause
(e.g., thati) cannot originate from there (example from Experiment 2 in Felser et al.,
p. 87, with layout added; ei is the empty category signaling the base extraction site).
(1) There are all sorts of magazines on the market.

(a) No constraint, gap
Everyone liked the magazine thati the hairdresser read quickly and yet
extremely thoroughly about ei before going to the beauty salon.
(b) No constraint, filled gap
Everyone liked the magazine thati the hairdresser read articles with
such strong conclusions about ei before going to the beauty salon.
(c) Island constraint, gap
Everyone liked the magazine thati the hairdresser who read quickly and
yet extremely thoroughly bought ei before going to the beauty salon.
(d) Island constraint, filled gap
Everyone liked the magazine thati the hairdresser who read articles with
such strong conclusions bought ei before going to the beauty salon.
Felser and colleagues conducted two experiments designed to disentangle the

roles of semantic and structural information in sentence processing. Overall, they
found that both L1 and L2 readers were sensitive to island constraints: semantic
or structural manipulations did not affect the reading when the manipulations
occurred inside a second relative clause ([1d] versus [1c]). Importantly, the tim-
ing of participants’ sensitivity to manipulations outside relative clause islands dif-
fered. The advanced L2 readers detected local semantic implausibilities before L1
readers, as shown in their first-pass reading times (Experiment 1); however, their
sensitivity to structural effects (as in [1b] versus [1a]) was delayed (Experiment 2).
Unlike for L1 speakers, structural effects did not show until the L2 speakers reread
the spillover region (underlined in example [1]); that is, it was delayed in both
time and space. This was shown by triangulating early and late eye-movement
measures for two different parts of the sentence. One possible theoretical implica-
tion, advanced by Felser and her colleagues, is that the data lent support to the
Shallow Structure Hypothesis (Clahsen & Felser, 2006a, 2006b). On this account,
L2 grammatical knowledge is either “incomplete, divergent, or of a form that
makes it unsuitable for parsing” (Clahsen & Felser, 2006a, p. 117, as cited in Felser
et al., 2012) and hence may give rise to delayed sensitivity to structural manipula-
tions as in (3b) versus (3a).
Second, Taylor and Perfetti (2016), in a word-learning experiment, looked at
the effects of individual differences and lexical knowledge on L1 reading behav-
ior. In their second experiment, 35 native English speakers were trained on 180
rare English words using one of the following combinations of orthographic (O),
phonological (P), and meaning (M) information: O, P, OP, OM, PM, and OPM.
Participants saw each word one, three, or five times. After completing the word-
training paradigm, the participants read sentences embedded with the new words.
This was the part for which eye movements were recorded. Of interest was how
the processing of words in sentences would differ as a function of partial word
knowledge (O, P, and/or M; number of exposures) that participants had obtained
from training.
Taylor and Perfetti found temporally distributed effects of different types of
partial word knowledge. In general, increasing orthographic exposures during
training had an effect on early processing measures. Meaning training, in interac-
tion with reading expertise, affected a late processing measure and phonological
training, in a three-way interaction with number of exposures and a participant’s
lexical knowledge, affected both early and late measures. Taylor and Perfetti’s
study is noteworthy for including seven eye-movement measures (plus another
two skipping measures in Experiment 1), covering both early and late durational
measures as well as probabilities. In so doing, the authors could uncover the tem-
poral dynamics of different aspects of word knowledge, with effects of meaning
crucially appearing after effects of form.
The previous examples show how eye-movement recordings enable a more
nuanced understanding of what types of information are used during language
processing. This includes, but is not limited to, semantic versus structural cues
(Felser et al., 2012) and form versus meaning (Taylor & Perfetti, 2016). Eye-
movement recordings thus offer the prospect of deconstructing general processes
such as word recognition, lexical access, or reanalysis into qualitatively dis-
tinct component processes, as was shown most convincingly in Taylor and Perfetti’s
word-learning experiment. Eye-tracking research is also emerging as a valuable
addition to research on implicit and explicit processing and knowledge (Andringa
& Curcic, 2015; Godfroid, Loewen, Jung, Park, Gass, & Ellis, 2015; Godfroid &
Winke, 2015; Suzuki, 2017; Suzuki & DeKeyser, 2017), where the focus is on
controlled versus automatic processing (Godfroid et al., 2015; Godfroid & Winke,
2015) or the real-time retrieval of linguistic knowledge (Andringa & Curcic,
2015; Suzuki, 2017; Suzuki & DeKeyser, 2017). As the number of applications of
eye tracking in L2 and bilingualism research continues to diversify, we can expect
to see more work that pursues the timing aspects of eye behavior.
We now turn to an overview of the different eye-movement measures that
have been identified in the present review of eye-tracking literature. This section
will reflect the early-late distinction and be structured accordingly: from early, to
intermediate, and finally late eye-movement measures.
7.2.1.2.2 Overview of Durational Measures

7.2.1.2.2.1 Early Measures
First fixation duration is the duration of the first fixation made in an interest area
(for an example, see Table 7.2 and Figure 7.4, reproduced here as Figure 7.7, for the
reader’s convenience). The earliest of all temporal measures, first fixation duration
is typically taken to signal the ease with which a reader can retrieve the meaning
of a word; that is, first fixation duration is commonly taken as a measure of lexical
access. First fixation duration reflects word-level processing, as opposed to sentence
integration, and is susceptible to early effects. This includes effects of (visual) input
enhancement (Alsadoon & Heift, 2015), orthographic overlap between cognates
(Cop, Dirix, Van Assche, Drieghe, & Duyck, 2017), word frequency (Godfroid et
al., 2013), and frequency of exposure in a text (Elgort, Brysbaert, Stevens, & Van
Assche, 2018; Mohamed, 2018; Pellicer-Sánchez, 2016). First fixation duration also
showed an effect in two grammar studies on the processing of grammatical and
ungrammatical compounds (Clahsen, Balkhair, Schutter, & Cunnings, 2013) and
reflexive pronouns with structurally accessible and inaccessible antecedents (Felser
& Cunnings, 2012).Therefore, first fixation duration is well represented in research
on lexical processing, including cognate processing, and grammar acquisition. It is
furthermore a key measure in studies that look at single-word reading (i.e., reading
of words presented in isolation on the screen). This includes a lexical decision task
with eye tracking (Miwa et al., 2014) and a reading aloud experiment investigating
the effect of orthographic depth (opacity of a language’s writing system) on read-
ing strategies (De León Rodríguez et al., 2016).
Different from the count measures discussed in Section 7.2.1.1, first fixation
duration is not commonly used when the interest areas are larger areas on the
screen, as is the case with multiword units, subtitles research (but see Muñoz,
2017), and language assessment research. Of note, first fixation duration often
does not show an effect of what the researcher is studying. I find the frequent lack
of effects in first fixation duration in SLA and bilingualism intriguing, especially
since research with L1/monolingual populations often does yield significant dif-
ferences in first fixation duration. Perhaps differences in L1-L2 processing speed
render first fixation duration a less informative measure when working with L2/
bilingual participants because readers must be very fast to show an effect in first
fixation duration. The present conjecture could be tested by building on the cur-
rent review and contrasting the findings of the L2/bilingual literature, reviewed
here, with those from L1/monolingual eye-tracking studies.
Single fixation duration is the duration of a fixation in an interest area
when the area was fixated exactly once (see Table 7.2). Therefore, single fixation
duration is a subset of all observations for first fixation duration, namely the cases
where there was one and one fixation only. Because many words are fixated more
than once, analyzing single fixation duration entails a significant loss of data. This
is why first fixation duration is generally preferred as an early measure (Rayner,
1998). That being said, single fixation duration can be used to address the same
questions as first fixation duration, namely on word-level lexical processing, gram-
mar acquisition, and single-word reading. In SLA and bilingualism, only Cop,
Drieghe, and Duyck (2015) novel-reading study included single fixation duration
as a dependent variable. Unlike first fixation duration, which showed an effect of
orthographic overlap between words, single fixation duration did not reveal any
significant differences in this study.
Gaze duration is the sum of all the fixations made in an interest area until
the eyes leave the area (see Table 7.2 and Figure 7.7, for an example). In reading
research, the saccade that takes the eyes out of the area could be either forward
or back; it doesn’t matter for gaze duration. For interest areas that consist of more
than one word, for instance idioms or collocations, larger grammatical construc-
tions, or subtitles, the same measure is called first pass reading time. Gaze
duration and first pass reading time are therefore calculated in the same manner,
but gaze duration applies to single-word regions whereas first pass reading time
applies to larger areas.
Of all the standard durational measures, gaze duration is perhaps the most
important and the most widely reported one. A large number of studies, all gram-
mar or vocabulary research, report both gaze duration and first fixation duration
as early measures (e.g., Balling, 2013; Carrol & Conklin, 2015; Clahsen et al.,
2013; Felser & Cunnings, 2012; Godfroid et al., 2013). An equal number of stud-
ies, however, report only gaze duration and not first fixation duration. These are
predominantly grammar studies (e.g., Felser et al., 2012; Sagarra & Ellis, 2013;
Spinner, Gass, & Behney, 2013; Vainio, Pajunen, & Hyönä, 2016) but there are
examples from idiom and collocation processing (Siyanova-Chanturia, Conklin,
& Schmitt, 2011; Sonbul, 2015), caption processing (Montero Perez et al., 2015),
and input enhancement (Winke, 2013) as well.
When the region of interest contains multiple words, as is the case in some of
these studies, it is worth thinking about what type of information first fixation dura-
tion might provide and how valuable that is. Readers will often need to make more
than one fixation to take in larger areas of text and so the duration of (just) the first
fixation may not tell us all that much. For instance, Sonbul (2015) analyzed first
pass reading time (and not first fixation duration) for adjective-noun collocations
such as fatal mistake, citing previous work by Siyanova-Chanturia and colleagues
(Siyanova-Chanturia, Conklin, & Schmitt, 2011; Siyanova-Chanturia, Conklin, &
van Heuven, 2011) as a rationale for doing so. Vainio et al. (2016), in a study on
modifier–noun case agreement, also limited their choice of early measures to gaze
duration because, the authors reported, first fixation duration did not show an effect.
In many other studies, interest areas do consist of single words, yet research-
ers still include only gaze duration as an early measure. Oftentimes, they will
not explain why they did not analyze first fixation duration, but a few potential
reasons come to mind. One reason is that first fixation duration is subsumed in
gaze duration (gaze duration is the duration of the first fixation plus any other
first-pass fixations) and so the two measures are not independent. This has impli-
cations for statistical testing, which eye-tracking researchers have mostly ignored,
until recently (Von der Malsburg & Angele, 2017). Another, substantive reason is
that researchers may be keen on capturing early reading processes (see Section
7.2.1.2.2.1), as reflected in first fixation duration and gaze duration, because they
believe early processes are most likely to reflect automatic, non-strategic read-
ing or parsing procedures (Godfroid & Winke, 2015). Beyond this focus on early
processes, however, the finer distinction between the first fixation and any addi-
tional first-pass fixations may be less important for specific research questions. For
instance, Sagarra and Ellis (2013), in a study on temporal cues in sentence process-
ing, reported only gaze duration and second pass duration as summary measures
of early and late processing, respectively. These two measures combined capture
most viewing activity on a word.
In cases when researchers do wish to distinguish between the first fixation and
any additional first-pass fixations in an area, refixation duration offers that possibil-
ity. Refixation duration (not to be confused with rereading time, which is a late
measure) is the difference between gaze duration (or first-pass reading time) and
first fixation duration (see Table 7.2 and Figure 7.7, for an example). Refixation
duration is therefore independent of first fixation duration, unlike gaze duration,
which subsumes both. In SLA, only one study has reported refixation duration to
date. Alsadoon and Heift (2015) found Arabic ESL learners had longer refixation
durations (and also longer first fixations) on enhanced than unenhanced English
words when reading sentences.
First subgaze is the duration of all fixations in an interest area before the par-
ticipant presses a button or makes some other type of overt response. First subgaze
is an example of a custom-made measure that a team of researchers developed
specifically for the purposes of their study. Miwa and colleagues (2014) designed
a lexical decision task with eye tracking to study L1 influence on L2 word pro-
cessing. Besides first fixation duration, their data analysis included two new meas-
ures, first subgaze duration and last fixation duration (described later), which
take into account the task-specific properties of lexical decision. In particular,
participants in a lexical decision task are asked to indicate by means of a button
press whether a string of letters presented on the screen is a word or not (yes/
no response). Because pressing a button is a type of conscious, overt behavior, the
researchers argued that first subgaze is “less contaminated by conscious lexical
decision response strategies than the last fixation, which was ended with a but-
ton press” (p. 452). More generally, Miwa and colleagues included eye tracking in
their lexical decision task in order to obtain more insight into the time course of
lexical processes that lead up to a lexical decision, which is hypothesized to be the
outcome of a series of events. The Japanese-English speakers’ eye-tracking data
showed an influence of the lexical, phonological, and semantic properties of L1

Japanese translations of the words in the English lexical decision task, even though
they never saw any Japanese, only English, in the experiment. This finding was
suggestive of the co-activation of both languages during the task (also see Section
3.2.2). First subgaze could also be informative in other tasks that include an overt
decision component, including grammaticality judgment tests, translation tests,
sentence-picture matching tasks, and written language assessment tasks.
Regression path duration, or go-past time, is a more complex duration
measure that considers the sequence of eye movements in addition to their dura-
tion. Regression path duration integrates the sum of all the fixations made from
the first entry in an interest area until the eyes exit the area to the right. For
instance, for the trial depicted in Figure 7.7a, this includes the fixation sequence
[7] + [8] + [9] outside of the interest area tahamul, as well as the fixations [6], [10],
[11], and [12] on tahamul itself, because the reader has not moved beyond the
critical interest area yet. Therefore, if there is a regression during the initial visit,
regression path duration will include the fixations made to a previous part of the
sentence or text; that is, outside the interest area proper.
Regression path duration has been argued to represent the time it takes to
overcome a processing difficulty (Radach & Kennedy, 2004; Rayner & Pollatsek,
2006; Reilly & Radach, 2006). On this view, a regression out of an interest area
is indicative of processing difficulty and the movement of the eyes past the inter-
est area represents the successful resolution of this difficulty. It follows from this
problem-and-repair sequence that regression path duration is a hybrid measure.
Some researchers have interpreted regression path duration as an early measure
(e.g., Chamorro, Sorace, & Sturt, 2016), because of the (early) first-pass regression
that characterizes it.1 Other researchers have taken regression path duration to be
a late processing measure (e.g., Van Assche et al., 2013) because of the reanalysis
that follows the initial regression out of the region. Both are right, in that regres-
sion path duration combines features of early and late measures alike.This earns it
an intermediary status (Conklin & Pellicer-Sánchez, 2016) as the latest of all early
measures or the earliest of all late measures.
Regression path duration is a frequently used measure in sentence- and
text-processing research. Researchers analyze regression path duration in gram-
mar research (Clahsen et al., 2013; Felser & Cunnings, 2012; Felser et al., 2012;
Felser, Sato, & Bertenshaw, 2009; Lim & Christianson, 2015) and all substrands
of vocabulary, including contextual word learning (Elgort et al., 2018), idiom
processing (Carrol & Conklin, 2017; Carrol, Conklin, & Gyllstad, 2016), and the
bilingual lexicon (Hoversten & Traxler, 2016;Van Assche et al., 2013). Regression
path duration was one of six measures in Elgort et al.’s (2018) study on contextual
world learning. The authors charted the learning trajectory for lower-frequency
L2 words (presumed unknown) compared to high-frequency control words that
all occurred between 8 and 64 times in a general-academic textbook. Similarly to
the other eye-movement measures analyzed in the study, regression path duration
for lower-frequency words gradually approximated regression path duration for

control words (suggestive of learning of word form). Different from some of
the other eye-movement measures, however, regression path duration remained
elevated until the end of the reading experiment (see Section 3.2.2).This example
shows how regression path duration, as a measure of word-to-text integration
(Elgort et al., 2018), can complement information about lexical access afforded by
first fixation duration and gaze duration. This makes the regression path duration
a valuable addition to eye-tracking researchers’ analytical toolkits.
7.2.1.2.2.2 Late Measures
Second pass time is the summed duration of all fixations made in an interest
area when the eyes visit the interest area a second time or after the eyes initially
skipped that area (see Table 7.2 and Figure 7.7, for an example). It is similar
to, but different from, rereading time, which includes any non-first-pass fixa-
tions (see below). Second pass time assumes an intermediate position in terms
of reporting frequency. It is a well-established measure in sentence- and text-
processing research, including grammar (Felser et al., 2009; Hopp & León Arriaga,
2016; Roberts, Gullberg, & Indefrey, 2008; Sagarra & Ellis, 2013) and vocabu-
lary research (Godfroid et al., 2013; Montero Perez et al., 2015), yet second pass
time is reported less frequently than first fixation duration, gaze duration, or total
time. Although researchers seldom explain why they did not include a particular
measure in their analyses, I suspect the many 0 values obtained for second pass
time play a role. Second pass time will be 0 when participants finish processing
a word in first pass or skip the area altogether. Many 0s in a variable will cause
that variable to be non-normally distributed (bimodal and skewed), which will
require researchers to transform the data and/or perform a different test.2 Even
though second pass time may require some additional data preparation, there are
good reasons for including this measure in the statistical analysis. Of note, second
pass time captures reanalysis following an initial processing difficulty and is a
pure late-processing measure, unlike total time and total visit duration (see later).
Roberts et al. (2008) compared L1 and L2 Dutch processing of personal pronouns
that referred either to a sentence-internal or a sentence-external antecedent. The
late measures, including second pass time, were more informative than the early
measures in this study because differential attachment preferences in the L2 Dutch
groups, compared to the L1 Dutch speakers, only surfaced later in the reading
process. Second pass time was also a useful measure in Sagarra and Ellis (2013),
who used it alongside gaze duration to obtain a full picture of the reading process
(gaze duration + second pass time ≈ total time). By using second pass time, rather
than total time, the authors were able to distinguish late from early processing
more clearly.
Rereading time (not to be confused with refixation duration, which is an
early measure) is the difference in reading time between total time and gaze
duration or first pass reading time (see Table 7.2 and Figure 7.7, for an example).
Therefore, rereading time is the sum of all fixations in an interest area except for
those fixations made during first-pass reading. Because visiting the same interest
area more than twice is relatively rare, rereading time and second pass time will
often yield the same values. Rereading time shares the same general properties as
second pass time; that is, a high occurrence of 0 values and a skewed distribution.3
Rereading time has been used in a handful of grammar studies (Boxell & Felser,
2017; Felser & Cunnings, 2012; Felser et al., 2012) and a study on the effects of
input enhancement on reducing vowel blindness in L1 Arabic–L2 English speak-
ers (Alsadoon & Heift, 2015). The work by Felser and her colleagues deals with
the resolution of long-distance dependencies, which are often studied using struc-
turally complex sentences (see example [1] in Section 7.2.1.2.1 for discussion, and
Section 3.2.1, for review). Like second pass time, rereading time is a suitable meas-
ure for capturing the amount of reanalysis in which L1 and L2 speakers engage
when trying to parse a sentence.
It should be noted that the distinction between rereading time and second
pass time is not always clear in empirical studies. Specifically, second pass time is
sometimes defined as any rereading or refixations of an interest area without fur-
ther reference to when the rereading or refixations occurred (i.e., during second
pass or beyond). Strictly speaking, this renders the measure an index of rereading
time, because second pass time refers to second-pass fixations only. Several studies
in the synthetic review had ambiguous definitions of second pass time like this,
which leads me to believe rereading time is a more widespread measure than
the numbers in Figure 7.6 suggest. To improve terminological precision, future
researchers should include clear definitions of their measures and, in the case of
second pass time versus rereading time, specify whether refixations beyond the
second pass are included.
Last fixation duration is the duration of the last eye fixation before a par-
ticipant makes a response. This measure was introduced in the field of bilingual-
ism by Miwa et al. (2014) in a lexical decision study with eye tracking involving
Japanese-English bilinguals. Last fixation duration was the duration of the last eye
fixation on the letter string before the participants pressed a button to make their
lexical decision. Miwa and colleagues argued that last fixation duration is “more
dedicated to response planning and execution” (p. 455). It stands in contrast with
the early measure of first subgaze, described previously, which can reflect “lexi-
cal effects in the word identification system not affected by conscious response
strategies” (ibid.). Although last fixation duration is a new measure for the fields
of SLA and bilingualism, it could also be informative in other tasks that require
participants to make an overt response, including grammaticality judgment tests
(GJTs), translation tests, sentence-picture matching tasks, and written language
assessment tasks.
Total time is the sum of all fixations made in an interest area (see Table 7.2
and Figure 7.7, for an example). It is the most frequently reported eye-tracking
measure in SLA and bilingualism, represented across all five strands of text-based
eye-tracking research. In a study on incidental vocabulary acquisition from read-

ing, Godfroid et al. (2018) chose to focus primarily on total time in their analyses
(i) because [total time] is the variable that is of most pedagogical interest,
(ii) because it has yielded the strongest associations with learning in previ-
ous studies, and (iii) because it encapsulates first fixation duration and gaze
duration (see Von der Malsburg & Angele, 2017, for caveats on multiple
testing in eye-movement research).
(Godfroid et al., 2018, p. 568)
In other words, total time is the go-to measure whenever global effects are of pri-
mary interest, although, in general, this should not stop researchers from analyzing
additional measures as well.
Effects are likely to surface in total time, because it combines all the viewing
activity that took place in a given area. In L2 assessment, Bax (2013) investi-
gated the cognitive validity of two IELTS reading tasks (sentence completion
and matching) with a total of 11 test items. In five out of the 11 items, total time
differentiated between successful and unsuccessful test takers (i.e., those who did
and did not answer the item correctly).These findings were echoed in Bax’s other
dependent variables—namely, visit duration, visit count, and fixation count. In
instructed second language acquisition, Indrarathne and Kormos (2017, 2018)
compared the effectiveness of different instructional conditions for teaching and
learning the causative had construction (e.g., he had the house painted).The authors
analyzed mean total time for 21 different occurrences of causative had and related
this to L2 English learners’ gains on two separate pre- and post-tests (Indrarathne
& Kormos, 2017). Within the strand of subtitles research, Bisson et al. (2014)
asked L1 English speakers to watch four chapters of the SpongeBob Square Pants
movie with either English, Dutch, or no subtitles. The movie soundtrack also
varied between Dutch and English. Bisson and colleagues compared total time
in the subtitle region for these different conditions and normalized their measure
for the time each subtitle was shown on the screen. In grammar research, Ellis,
Hafeez, Martin, Chen, Boland, and Sagarra (2014) analyzed total time as the sole
indicator in a study on learned attentional biases in processing temporality in
L2 Latin. Finally, Godfroid and colleagues (2018), whose vocabulary study was
introduced at the beginning of this section, also chose to focus their analyses
on total time, given that the authors’ primary aim was to uncover associations
between overt attention and incidental vocabulary learning.
Although it is often described as a late measure, total time is actually a hybrid
measure that conflates both early and late stages of processing (i.e., gaze dura-
tion + rereading time). In that regard, total time bears some similarity to regres-
sion path duration (described above), which conflates early processing and the
time it takes to overcome a processing difficulty. The hybrid nature of total time
could be a reason some researchers choose not to include total time in their
analyses. As mentioned, total time aggregates several other measures and there-
fore these measures (e.g., gaze duration) tend to be correlated with total time
(for a visual representation, see Figure 7.20). When researchers analyze multiple,
correlated eye-movement measures for the same set of eye-tracking data, they
run several non-independent statistical comparisons. To control the Type I error
rate, Von der Malsburg and Angele (2017) suggested lowering the significance
level α by applying a Bonferroni correction or using a rule of thumb whereby
at least two eye-tracking measures need to be significant for an effect to be
considered reliable. A third possibility, suggested here, would be to analyze a
set of measures that are not correlated with each other, such as first fixation
duration, refixation duration, and rereading time (for a visual representation, see
Figure 7.21). In that case, no additional steps are necessary to ensure the validity
of one’s statistical results.
Total visit duration is the summed duration of all visits to a particular inter-
est area (see Table 7.2 and Figure 7.7, for an example). A visit is defined as “the
time interval between the first fixation [in an interest area] and the end of the last
fixation within the same [interest area] when there have been no fixations out-
side the [interest area]” (Tobii Studio User’s Manual v. 3.4.5, p. 110). Therefore, a
visit is similar to the concept of a pass in reading, but the term visit is used more
broadly in other areas of eye-tracking research as well. Total visit duration is a less
frequently used measure, reported only in Bax (2013) so far. Total visit duration
is very similar to total time, discussed previously, and therefore, it may not be
clear what additional information total visit duration can provide.4 In a reading
assessment study, Bax (2013) sought to validate items from the IELTS reading
test as measures of careful local reading and expeditious (i.e., quick and selec-
tive) local reading, respectively. Bax analyzed total visit duration along with three
other measures to gauge how readers attend to specific portions of a text when
answering specific test items. Successful test takers (i.e., those who solved an item
correctly) differed in their total visit durations and other eye-tracking measures
from unsuccessful test takers on a subset of all test items in a manner Bax argued
supported the cognitive validity of the test.
Screen displays in assessment research like Bax’s study will often feature func-
tionally distinct regions on the screen, such as test prompts or questions, answer
options, and reading texts, images, or videos. This may warrant highlighting the
concept of a visit, and hence analyzing total visit duration, because a visit to a par-
ticular area is a functionally meaningful event in language assessment. Generalizing
a bit, total visit duration would seem a useful measure in any research study that
includes large interest areas on the screen (see Section 6.1.2). This includes subti-
tles research and some areas of instructed second language acquisition, in addition
to assessment research. Reporting both total visit duration and total time seems
like overkill, because the two measures are so similar. Researchers who are not
sure which measure to include can ask what the role of the different areas on the
screen is.When interest areas are all the same (e.g., all words rather than words and
images together), total time is the default option.
Expected fixation duration is the expected time participants will spend
in a given region on the screen if they distribute their attention evenly across
all the information on the screen (see Table 7.2 and Figure 7.7, for an example).
Expected fixation duration is usually compared with observed fixation duration,
which is the time participants actually spend in the region, as measured by the eye
tracker. Therefore, the difference between observed and expected fixation
duration, or ΔOE, indicates whether participants spent a proportionally greater
or smaller amount of their time in a given area than would be expected if they
processed all information with equal depth. This renders ΔOE the quantitative
equivalent of a color patch on a heatmap (see Section 7.2.3.1): positive values
indicate more attention (warmer colors in a heatmap) while negative values indi-
cate less attention (cooler colors).
Calculations of ΔOE can be letter-, syllable-, or word-based (for arguments
in favor of a syllable-based measure, see Indrarathne & Kormos, 2017, 2018). For
instance, say a participant reads a short, 80-syllable text in 8s or 8,000 ms. This cor-
responds to a mean reading time of 100 ms per syllable. If the target structure in
the text is five syllables long, the expected fixation duration for the structure is 500
ms. If the participant actually spent 625 ms reading the target structure, then ΔOE
is 125 ms. Indrarathne and Kormos (2017, 2018) calculated ΔOE in this manner to
quantify attentional processing (also see Godfroid & Uggen, 2013). They used total
time as a basis for their calculations, but the same formula could in principle be
applied to any fixation duration measure. When using ΔOE, attention is measured
in reference to the participant’s performance on the task itself, rather than vis-à-vis
a control condition, and so researchers may not need to worry about the compara-
bility of their experimental and control conditions (Indrarathne & Kormos, 2017).
That said, in Indrarathne and Kormos (2017, 2018), the results for ΔOE and total
time (a traditional measure) were highly similar so more research is needed to show
that calculating attention in this new manner makes a difference.
At a conceptual level, ΔOE is an interesting measure when groups differ in
their time on task, as in Indrarathne and Kormos’s studies. In theory, ΔOE should
be sensitive to target-form-specific differences in attention after controlling for
differences in time on task. ΔOE could also be useful to quantify changes in
attention that result from repeating the same task. For instance, some empirical
studies on the Output Hypothesis (Swain, 1985) have used an input–output–input
sequence (i.e., reading–writing–reading) in their experimental designs (e.g., Izumi
& Bigelow, 2000; Izumi, Bigelow, Fujiwara, & Fearnow, 1999; Song & Suh, 2008).
In a replication with eye tracking (He & Li, 2018), ΔOE could show whether
attention to the target structure increases disproportionately during the second
reading task, as the noticing function of output would predict (Swain, 1985), or
whether readers just generally speed up or slow down in round two of the task.
7.2.1.3 Fixation Latency
First fixation latency, or time to first fixation, is the time it takes for a par-
ticipant to look at a particular interest area, as measured from a prespecified point
in the trial (see Table 7.3). First fixation latency differs from first fixation duration,
described previously, in that first fixation latency is not about how long the initial
fixation lasted, but rather, how long it took for the fixation to happen.This makes
first fixation latency a “one-point measure” (Andersson, Nyström, & Holmqvist,
2010, p. 3). Only the beginning of the first fixation matters for calculating first
fixation latency.
By nature, first fixation latency needs to be measured relative to some other
event in the trial. The simplest case is measuring first fixation latency from trial
onset (i.e., from the beginning of the trial). In other studies, including visual
world experiments, it may make more sense to start measuring from a later time
point, for instance the onset (beginning) or offset (end) of a linguistic cue that
is embedded within the spoken or written input (e.g., Encuentra la pelota, “Find
the ball” ), where the feminine article la cues the feminine noun pelota (Grüter,
Lew-Williams, & Fernald, 2012). When measurement is to begin later in the trial,
researchers need to mark the onset of measurement by inserting a time stamp
in the eye-tracking software. This procedure is demonstrated in Section 6.3.2.2,
using the auditory stimuli for a visual world study as an example. The idea is to
measure when, exactly, a critical word begins or ends and to enter that informa-
tion into the programming software. Conceptually, adding a time stamp is like
programming a stopwatch, so it starts timing the “race” for an eye fixation at the
right moment in the trial.
First fixation latency is the second most widely used measure in the visual world
paradigm, after fixation proportion and fixation probability (see Figure 7.3). Latency
measures have been reported in 25% of all visual world studies in the present review
(Dussias,Valdés Kroff, Guzzardo Tamargo, & Gerfen, 2013; Grüter et al., 2012; Hopp,
2013, 2016; Hopp & Lemmerth, 2018), including two language production stud-
ies (Flecken, 2011; Kaushanskaya & Marian, 2007). First fixation latency has also
proven useful when working with specific tasks such as picture-word interference
(Kaushanskaya & Marian, 2007) and a modified Stroop task (Singh & Mishra, 2012).
In comparison, only one print study (De León Rodríguez et al., 2016) included first
TABLE 7.3 Definition and example of first fixation latency
Measure Definition Example (from

Figure 7.8)
First fixation The time it takes for a participant to look in Time between [1]
latency (Time a given interest area, as measured from a and [2]
to first fixation) prespecified point in the trial
fixation latency in order to gauge the time it took participants to fixate on a word
in a single-word reading task (see Figure 7.8).
For their Stroop task, Singh and Mishra (2012) instructed participants to select
the ink color of the print word, while ignoring the word’s meaning, by looking
at one of four color patches on the edges of the screen (see Figure 7.9). This use
of first fixation latency in eye tracking is akin to measuring reaction times in a
button-press experiment (also see Section 4.2.1). In an elegant production study,
Flecken (2011) used first fixation latency to gain insight into how early bilinguals
conceptualize and describe events. She found that a higher use of progressive
aspect (aan het V in Dutch, V-ing in English) correlated with faster looks to the
action region in the video (e.g., the hands of a man folding a paper airplane),
which was the region that contained information about the ongoing nature of
the event. Flecken argued participants’ looks to the action region showed that
they were extracting information about the ongoing status of the event, which
in turn was linked to their use of the progressive aspect (see Section 4.2.4). First
fixation latency has also played an important role in the prediction strand of visual
world research (Dussias et al., 2013; Grüter et al., 2012; Hopp, 2013, 2016; Hopp
& Lemmerth, 2018). Here, latency has been analyzed to uncover anticipatory
processing; that is, looks to a referent on screen that is yet to be mentioned in the
auditory input (see Section 4.2.2). It is expected that listeners will look faster to
the target image on trials that contain an informative cue that allow for predic-
tion, compared to trials that do not. First fixation latency can tell us whether
listeners do, in fact, look faster in the prediction trials.
To illustrate, first fixation latency has been the measure of choice in a series of
visual world experiments on grammatical gender by Holger Hopp and colleagues
(Hopp, 2013, 2016; Hopp & Lemmerth, 2018). In the 2016 study, Hopp examined
FIGURE 7.8 First fixation latency in single-word reading. First fixation latency was
the time it took participants to make an eye movement from the left-
cross (stage B) to the word or pseudoword on the right (stage C).
(Source: De León Rodríguez et al., 2016).
FIGURE 7.9 First fixation latency in an oculomotor Stroop task. The participants saw
a color term in their L1 (e.g., hara, “green”) and needed to make an eye
movement to the color patch that matched the ink color (here, red)
while ignoring the meaning of the word.
(Source: Singh & Mishra, 2012).
the effects of lexical training on L1 English - L2 German speakers’ use of gender

cues during real-time listening. Using a pretest - treatment - posttest design, with
eye tracking as the pre- and posttests, Hopp showed that intermediate-level learn-
ers could use gender predictively only at posttest, after they had practiced the
determiner - noun combinations used in the experiment (see Section 4.2.2.5).
Hopp supplemented his statistical analysis of fixation latency with graphs of fixa-
tion proportion over time. Of note, Hopp did not analyze the data in these graphs
statistically, but relied entirely on the analysis of first fixation latency to support
his claims of prediction, and lack of prediction, statistically. First fixation latency,
then, offers a simpler, albeit less fine-grained, alternative for analyzing participant
looks in a visual world experiment.
7.2.1.4 Fixation Location
First fixation location represents the landing position of the eye, expressed as
a percentage of the total length of the interest area (see Table 7.4). For instance,
if a participant initially lands on the fourth letter s in the eight-letter word
amassale (see Figure 7.10), his or her first fixation location will be 50%. Fixation
location is still a new measure in L2 and bilingualism research. So far, only De
León Rodríguez et al. (2016) used this measure (alongside first fixation duration
and first fixation latency) to uncover crosslinguistic influences on reading strate-
gies. De León Rodríguez and his colleagues recruited balanced French-German
bilinguals, who read out loud French or German words and pseudowords
TABLE 7.4 Definition and example of first fixation location
Measure Definition Example (from Figure 7.10)

First fixation The location of the first fixation made 50% (s is the fourth letter
location in an interest area, expressed as a in the eight-letter word
percentage of the area’s total length amassale)
FIGURE 7.10 First fixation location in single-word reading.

(Source: De León Rodriguez et al., 2016).
(see Figure 7.10). The researchers wanted to know if differences in the two
languages’ orthographies, French being a more opaque language than German,
would influence the bilinguals’ first fixations. The researchers found no effects
of language on first fixation latency, discussed previously, but an effect on first
fixation location, which they attributed to differences in French and German
orthography. First fixation location, then, can help researchers understand the
fine, sublexical details of the reading process. Future researchers may find in first
fixation location a useful measure to study the eyes’ landing site (see Section
2.4) and how this landing site may shift during reading development.
7.2.2 Regressions
Regressions are eye movements that transport the eye opposite to the reading
direction, for instance right-to-left eye movements in English and left-to-right
eye movements in Arabic. Inherent in regressions is the notion of a task order;
that is, a clear sequence for the eyes to follow, from interest area 1, to interest
area 2, to interest area 3, and so on. When the task sequence is interrupted and
the eyes move back to an earlier interest area on the screen, the movement is
defined as a regression. Regressions are most meaningful in reading and closely
related tasks, precisely because there is a default manner to complete the task
(e.g., word-by-word, left-to-right eye movements in English). Other tasks such as

watching subtitled videos or describing a picture or a video are less constrained in
terms of how participants will engage with the materials. In such tasks, eye move-
ments in between interest areas are described as transitions, rather than forward
or regressive eye movements. Transitions have yet to be analyzed in L2 research
(see Section 9.3.1, research idea #9); however, the visits to which transitions give
rise are an important source of data in assessment research (Bax, 2013; McCray &
Brunfaut, 2018; Suvorov, 2015).
The ability to measure regressions, or not, is a distinguishing feature of differ-
ent manufacturers’ eye trackers.Therefore, researchers may want to consider ahead
of time whether regressions are an important measure in their line of research (for
further considerations when choosing an eye tracker, see Chapter 9). At the time
of writing this book, a major eye-tracking manufacturer did not allow for the
measurement of regressions. When the eye tracker does not measure regressions,
researchers can either code regressions manually or exclude any regression-based
measures from their analyses. Another possibility is to team up with a colleague
who has programming skills and is able to write a script to extract regression-
based measures from the raw eye-tracking data.
Regressions are generally considered a measure of reanalysis. They reflect a
participant’s need to further process a specific area on the screen after the eyes
have left it. Because of this, regressions have been linked with different types of
processing difficulty—lexical, syntactic, and discourse (i.e., text integration) difficul-
ties. Other regressions reflect an attempt to correct an oculomotor error. These are
regressions that bring back the eyes to their intended target after the reader over-
shot (i.e., landed past) it initially (see Section 2.4). It is important to note that even
in skilled, adult L1 reading, about 10–15% of saccades are regressions (Rayner, 1998,
2009), which suggests regressions are part and parcel of a normal reading process. In
child L1 readers, the regression rate is higher (Reichle et al., 2013).The diminishing
regression rate over the course of L1 reading development points to an association
between reading skill and regressions that may extend to L2 readers as well.
Reichle, Warren, & McConnell (2009) expanded the E-Z Reader model, a
prominent model of eye-movement control during reading, with a regression
component (also see Section 2.6). They hypothesized that regressions may occur
when there is a problem with postlexical language processing, which they
defined as the process of linking a word with a higher-order (syntactic, semantic,
and discourse-level) representation. During normal comprehension, the effects of
postlexical processing cannot be discerned in the eye-movement record; however,
when word integration fails, a regression or refixation of the same word is likely
to ensue. Thus, the computer simulations reported in Reichle et al., (2009) lent
support to the view that regressions are tied to comprehension difficulty.
Just like fixation-based measures, regressions are a cover term for a number of
saccade types that bring the eyes back to an earlier location in the text. A key dis-
tinction in categorizing regressions is whether researchers take the starting point
or the destination of a regression as a reference. Regressions in are regressive

eye movements that land in a predefined interest area. In comparison, regres-
sions out are regressive eye movements that are launched from a given interest
area. One and the same interest area can therefore be the recipient and origin
of regressive eye movements and these two types of regressions (regressions in
and regressions out) should not be conflated, even though in practice this often
happens. Often, researchers will simply state in their papers they analyzed ‘regres-
sions’, leaving it up to the reader to infer whether regressions in or regressions
out were meant.
Regression-based measures have been included in only 21% of L2 eye-
tracking studies to date (see Figure 7.11), compared to 38% of studies with fixa-
tion count measures and as much as 94% with durational measures. The most
commonly analyzed type of regression are regressions in, which as stated pre-
viously, refer to backward eye movements that land in a predefined interest area
(for an example, see Table 7.5 and Figure 7.4, reproduced here as Figure 7.12, for
the reader’s convenience). Regressions-in analyses are found in grammar research
(Felser et al., 2009; Keating, 2009; Roberts et al., 2008; Spinner et al., 2013;Vainio
et al., 2016) and more recently also in L2 vocabulary studies (Elgort et al., 2018;
Mohamed, 2018). Keating (2009), in a study on noun-adjective gender agree-
ment, wanted to know what part of the sentence L1 and L2 Spanish speakers
go back to if they launch a regression from the adjective or one of the following
words. He found that participants do not go back from the adjective to the noun
to check its gender features, unless the noun and the adjective are adjacent (see
Section 3.2.1). This finding is consistent with Von der Malsburg and Vasishth’s
(2011) conclusion, obtained for L1 readers, that the landing sites of regressions are
not linguistically determined. In contrast, Elgort and her colleagues (2018) sur-
mised that regressions back into unknown vocabulary words during reading may
reflect readers’ initial misreading of these words (e.g., misreading succor as soccer).
FIGURE 7.11 Regression measures in eye tracking in SLA and bilingualism (used in

11 out of 52 studies with text).
Perhaps surprisingly, the researchers found that the regression measure patterned
with first fixation duration and showed a rapid decrease over the first five encoun-
ters with unfamiliar words in a text (see Section 3.2.2). There is no consensus,
then, yet as to whether the landing sites of regressions reflect a strong linguistic
influence or are mostly spatially determined.
Just as what goes up must come down, what regresses in must regress out.
Regressions out represent movements from a given interest area that take the
eyes against the direction of reading (see Table 7.5 and Figure 7.12, for an example).
While regressions in are delayed by nature (you must first move past a word in order
to be able to return to it later), regressions out can occur at different stages, or passes,
in the reading process. A first-pass regression out is a regression that is launched
upon the initial visit of an interest area. Example studies that have looked at first-
pass regressions are Keating (2009), Lim and Christianson (2015), and Mohamed
(2018). Delayed regressions out are regressions that ensue following a revisit of
an interest area (temporally delayed regressions) or regressions that are launched
from a word after the primary interest area, such as the spillover region (spatially
delayed regressions). First-pass and delayed regressions combined make up total
regressions out. Researchers have generally analyzed a single regression measure,
alongside one or more temporal eye-movement measures (i.e., measures of eye
fixation duration, see Section 7.2.1.2). A notable exception is Keating (2009), who
differentiated between the different types of regressions out (i.e., first pass, spatially
delayed, and total) in his analyses. Keating found that only advanced L2 Spanish
speakers picked up on ungrammatical noun-adjective agreement during natural
in context.
reading and even then, only when the adjective immediately followed the noun
(see Section 3.2.1).The results were strongest for total regressions (more regressions
when there was a noun-adjective gender mismatch), but trended in the same way
when first-pass and delayed regressions were analyzed separately.
Because saccades are so fast (see Section 2.2), researchers care more about
whether or not a regressive eye movement took place rather than how long the
regression lasted. Accordingly, analyses of regressions focus on regression counts
and measures derived from counts; that is, regression proportions (expressed as a
number between 0 and 1) and regression probabilities (expressed as a percent-
age, ranging from 0 to 100). Readers who need a refresher on these concepts are
referred to Section 7.2.1, which dealt with counts, probabilities, and proportions
in the context of fixations and skips. Regression rates is a more general term
researchers use to refer to either the proportion or probability of regression. In
sum, a regression is seldom simply a regression (see Table 7.5). As a reader, we must
TABLE 7.5 Definitions and examples of regression measures
Measure Definition Example (from Figure 7.12a

and b)
Regression An eye movement opposite the In Figure 7.12a: [6] → [7]
normal reading direction, for In Figure 7.12b: [11] →
instance a right-to-left eye [12]
movement in English
Regression in A regressive eye movement that In Figure 7.12a: none for
lands in a predefined interest area tahamul
In Figure 7.12b: [11] →
[12]
Regression out A regressive eye movement that is In Figure 7.12a: [6] → [7]
launched from a given interest In Figure 7.12b: none for
area tahamul
First pass A regressive eye movement that is In Figure 7.12a: [6] → [7]
regression out launched from a given interest In Figure 7.12b: none for
area before the reader moves tahamul
forward to the next region
(Spatially) delayed A regressive eye movement that is In Figure 7.12b: [11] →
regression out launched from a region past the [12] for tahamul
primary interest area
(Temporally) A regressive eye movement that is In Figure 7.12a: none
delayed launched from a given interest In Figure 7.12b: none
regression out area when the reader revisits that
(Second pass interest area
regression out)
Total regression The total number or frequency of In Figure 7.12a: [6] → [7]
out first-pass and delayed regressions In Figure 7.12b: [11] →
out [12]
Note: All measures can be expressed as counts, proportions (0–1), or probabilities (%).
figure out what types of regressions are meant (in or out), for which region and
which time window in the analysis (initial, delayed, total), as well as how the little
mavericks were measured (counts, proportions, or probabilities).
7.2.3 Integrated Eye-Tracking Measures

Integrated eye-tracking measures are measures that combine multiple events (e.g., fix-
ations and saccades) into a single, qualitative or quantitative representation. Heatmaps,
gaze plots, and scanpaths are all integrated measures that are used to visualize the
spatial distribution and duration of eye fixations. They have been used primarily
in assessment research (see Figure 7.13), where they can provide convenient tools to
make sense of the data.While heatmaps, gaze plots, and scanpaths are convenient, they
are also descriptive in nature. Heatmaps, gaze plots, and scanpaths do not, in them-
selves, constitute data analysis. For heatmaps in particular, the appearance can change
based on the settings you choose (Bojko, 2009; Holmqvist et al., 2011) so heatmaps
may or may not be a faithful representation of your data (Bojko, 2009).This is another
reason researchers should be cautious when basing their claims solely on visual repre-
sentations of data, without statistical support to go with it. Later in this section, I will
discuss a grammar study (Godfroid et al., 2015) in which the authors corroborated
their visual scanpath analysis using inferential statistics. Indeed, heatmaps and gaze plots
can be subjected to data analysis (Holmqvist et al., 2011), but to date few researchers in
SLA and bilingualism have attempted to do so. In light of these facts, the focus in this
section will be on what can be inferred from heatmaps, gaze plots, and scanpaths and
how these tools can best represent the findings of a given study.
7.2.3.1 Heatmaps, Luminance Maps, and Gaze Plots

In simple terms, a heatmap is a visualization of the screen display, overlaid with a
smooth landscape of fixation data represented in different colors, for instance red,
yellow, and green (though colors can be customized). Warmer hues indicate more
FIGURE 7.13 Integrated measures in eye tracking in SLA and bilingualism (used in

three out of 52 studies).
viewing activity. Depending on the type of heatmap it is, this could mean the
region attracted either longer fixations or more fixations; in practice, these two
will often correlate.To create a heatmap, researchers first need to select the partici-
pants and trials they want to include. Because heatmaps capture the distribution
of eye fixations in space, it is usually better not to aggregate data across different
trials unless the trials have the exact same spatial layout (i.e., the same background
image or sentence). Heatmaps can be produced for a single participant (e.g., Bax,
2013) or groups of participants, such as native and non-native English-speaking
children completing the same task (e.g., Lee & Winke, 2018). Figure 7.14 repre-
sents two heatmaps of group-level data from Lee and Winke (2018) (for a color
version of these images, see Lee and Winke’s article).
Once you have selected the data subset to include in the heatmap, the eye-
tracking software will plot all the data samples (i.e., individual data points recorded
by the eye tracker) against the background image. Following a scaling process, in
which fixations are compared against the full range of values in the data set,
every eye fixation will receive a color that reflects its weight. Areas with more or
longer eye fixations will be painted in warmer hues (see Holmqvist et al., 2011,
for technical details). The color legend, which is now standard in major eye-
tracking software, summarizes the outcome of this process (see Figure 7.15a).
Older software versions did not provide a legend, which prevented readers from
accurately interpreting heatmaps included in published studies. Because colors
are relative to the data collected, the color legend should always be included as
a part of research articles. Using the software settings, researchers can further
customize the scale of their heatmaps by lowering or raising the cutoff for the
maximum value (e.g., what counts as red). This is recommended when the goal
is to compare participant groups that differ in their overall viewing activity (e.g.,
native and non-native speakers) because without such an adjustment, the colors
in the two heatmaps will not mean the same. Once the algorithm has worked its
magic, the software will produce a smooth fixation landscape, where all fixation
activity in the display is rendered on a color scale, from warm (most activity) to
cold (least activity).
The first thing to decide when creating a heatmap is whether you would like
to use fixation counts or fixation durations as a dependent variable for the heat-
map (see Table 7.6). Make sure you report this information in a publication to
help readers interpret the figures accurately. In a fixation count heatmap, every
fixation will be assigned the same weight, regardless of its duration. Fixation
duration heatmaps, on the other hand, will weigh fixations differently depend-
ing on their length (longer fixations will be painted in warmer colors).Two major
eye-tracking manufacturers further provide the option of creating heatmaps for
relative, rather than absolute, fixation behavior. Relative-duration and relative-
count heatmaps are useful for multi-participant and/or multi-trial heatmaps
in which the recordings for the participating subjects or trials differ in length.
Without adjusting for trial length from individual participants or trials, the weight
FIGURE 7.14
Heatmaps of fixation behavior during an English speaking test: L1
English children (top) and English language learners (bottom).
(Source: Reprinted from Lee, S., & Winke, P., 2018. Young learners’ response processes when taking
computerized tasks for speaking assessment. Language Testing, 35(2), 239–269, with permission from
Sage. © 2017 The Authors © 2013 Educational Testing Service (ETS). Sample task from TOEFL
Primary® Speaking Test reprinted by permission of ETS the copyright owner).
FIGURE 7.15
Two visual representations of essay-rating data: (a) heatmap and (b)
luminance map. Note: the figures represent eye-fixation durations from
ten raters using an analytic rubric to rate essays. Data are shown for the
entire trial period (36 minutes) using the default scale.
(Source: Data supplied by Dr. Laura Ballard, ETS, and Dr. Paula Winke, Michigan State University;
Ballard, 2017).
TABLE 7.6 Four types of heatmap
Measure Definition
Fixation duration Color-coded representation of the duration of individual fixations
heatmap on a background image. Longer fixations are represented with
warmer colors.
Fixation duration A fixation duration heatmap that expresses fixation duration as a
heatmap proportion of trial length (useful when there are multiple trials
(relative) and trials differ in length)
Fixation count Color-coded representation of the number of fixations on a
heatmap background image. A higher fixation density is represented with
warmer colors.
Fixation count A fixation count heatmap that expresses fixation count as a
heatmap proportion of the total number of fixations in a trial (useful
(relative) when there are multiple trials and trials differ in length)
of looks in shorter trials will generally be underestimated. For instance, a 250 ms

fixation in a 5000 ms trial and a 100 ms fixation in a 2000 ms trial both make up
5% of the recording, even if the fixations themselves differ in length. Therefore,
the two fixations will have the same color in a relative-duration heatmap, but in
an absolute-duration heatmap, the 250 ms fixation will have the warmer of the
two colors.
Although heatmaps are the most common type of data visualization, a poten-
tial drawback is that the parts of the screen that enjoyed the most attention are
colored, which makes them hard for readers to see. Researchers can work around
this by increasing the image transparency, as shown in Lee and Winke’s (2018)
example (see Figure 7.14). Another possibility is to produce a luminance map
(Holmqvist et al., 2011) or see-through map (SR Research, 2017) instead.
Luminance maps are the negative of a heatmap, in grayscale (see Figure 7.15b).
The areas on screen that attract the most fixations or the longest fixations are left
clear, while the rest is shaded. As a result, readers can see at a glance which parts
of the screen participants dwelled on the most because the readers’ eye gaze is
immediately directed to these very areas. Heatmaps and luminance maps are mir-
ror images of each other; thus, it is up to individual researchers to decide which
representation more aptly captures their data.
Lastly, a visual that has individual fixation data superimposed on a background
image (without any further data aggregation) is a gaze plot (see Figure 7.16).
Gaze plots have been presented side by side with color heatmaps in L2 assess-
ment research (Bax, 2013; Lee & Winke, 2018).These plots add information about
individual fixations and fixation sequences to the general picture of participants’
viewing activity, which can be rendered in a heatmap or luminance map. In gaze
plots, every fixation is plotted as a separate dot. The size of the dots may or may
not be proportional with fixation duration: when there are size differences, larger
dots will signify longer fixations. The dots are connected by lines (saccades) and
together, they represent the fixation sequence, or scanpath, for a given task. Thus,
gaze plots are one type of visual representation of a scanpath. They are used for
descriptive purposes only. In the next section, we will consider how scanpaths can
be quantified and submitted to statistical analysis, in what is known as a scanpath
analysis.
When plotting individual fixations separately, as is the case in a gaze plot,
it may be better to limit the amount of data to a short time period (e.g., eye
movements from one participant or one short trial) lest displays get crowded
and become difficult to interpret (see Bax, 2013). In some cases, it may be more
meaningful to zoom in on a particular time segment within a trial. For instance,
Lee and Winke (2018) juxtaposed a native and a non-native English-speaking
child’s gaze plots at a similar point in a timed speaking task (17–19 seconds left).
They showed how the two children interacted differently with the onscreen
timer and, following extensive data triangulation, recommended that the timer
be removed or made more child-friendly in a revision of the speaking test (see
Section 3.2.5). The gaze plots, then, were one of multiple sources of information
on which the researchers based their claims. This underscores the point that gaze
plots should bring clear added value to a study to warrant their inclusion along-
side other data sources in a research paper and will, in general, require triangula-
tion with other measures.
7.2.3.2 Scanpaths
Scanpaths are visual or numeric representations of eye-movement patterns that
show a sequence of fixations and saccades in time and space. Compared to, say
eye fixation durations or regressions, scanpaths capture eye-movement behavior
over a larger time window and a greater area of space. This earns scanpaths their
status as an integrated measure. In eye-tracking software, scanpaths are commonly
represented as gaze plots (for an example, see Figure 7.16) and in this form,
they can be used for descriptive purposes. Thus, one function of scanpaths is
descriptive analysis based on scanpath visualizations (e.g., gaze plots). Scanpaths
can also be used to check recording quality, especially when working with text-
based stimuli (see Section 8.1.2). When a scanpath representation of reading data
floats systematically above or below a line of text (for an example, see Figure 8.5),
this indicates there was a vertical offset in the data recording and action may be
required. Lastly, scanpaths distinguish themselves from other integrated measures
such as heatmaps and gaze plots, in that scanpaths can be subjected to statistical
analysis (see Godfroid et al., 2015, for an example).Thus, when used appropriately,
scanpaths may combine the appeal of a more holistic, integrated measure with the
rigor of a statistical data analysis.
FIGURE 7.16 Gaze plots during an English speaking test: L1 English child participant
(top) and English language learner (bottom).
(Source: Reprinted from Lee, S., & Winke, P. 2018. Young learners’ response processes when taking
computerized tasks for speaking assessment. Language Testing, 35(2), 239–269, with permission from
Sage. © 2017 The Authors © 2013 Educational Testing Service (ETS). Sample task from TOEFL
Primary® Speaking Test reprinted by permission of ETS the copyright owner).
Scanpaths are still a new measure in L2 research. In neighboring disciplines

such as education, linguistics, and psychology, scanpaths have been analyzed to
address questions that are pertinent to our field as well. Sample questions include:
•• What global reading strategies do adult L1 readers use when reading exposi-
tory text? (Hyönä, Lorch, & Kaakinen, 2002).
•• How does background sound influence the gaze scanpath of people watch-
ing a film clip? (Vilaró et al., 2012).
•• How do expert and novice school teachers from different cultural back-
grounds use their eye gaze in real-world classrooms? (McIntyre & Foulsham,
2018).
•• Can participants meaningfully interpret their own and other people’s static
and dynamic gaze displays? (Van Wermeskerken, Litchfield, & Van Gog, 2018).
These questions are relevant to many areas of L2 studies, including reading

research, subtitles and captions research, classroom-based studies, as well as work
that relies on participants’ introspection by means of stimulated recall (Gass &
Mackey, 2017) for which the participants’ own eye-tracking data might provide
a memory support. In other words, there is no shortage of potential applications
of scanpath analysis in SLA and bilingualism (for a potential application, see
Figure 7.17). Readers may wish to think about if and how scanpaths can inform
their own research.
In order to proceed with a scanpath analysis, researchers need to consider
how they will represent the scanpath for further analysis. One approach is to
represent scanpaths by means of symbol strings (e.g., A, B, C, D, and their combi-
nations), whereby each symbol represents a functionally meaningful area on the
screen. Researchers define the areas based on their understanding of the stimuli,
the participants’ primary task, and the research questions (see Chapters 5 and 6).
For instance, in a captions project, the image area could be one region and the
FIGURE 7.17
Sequence of teacher scanpaths. The teacher’s eye gaze alternated
between a student, teacher material, and the classroom at large.
(Source: Reproduced from McIntyre and Foulsham (2018) under a Creative Commons Attribution
4.0 International License © 2018 The Authors, http://creativecommons.org/licenses/by/4.0/).
FIGURE 7.18 Functional regions for four grammatical structures in a grammaticality

judgment test. Note: A = sentence-initial region; B = primary interest
area; C = spillover region; D = sentence-final region.
subtitle area could be another region. Or, in a modified version of the same pro-
ject, the image area could be one region and the subtitle area could be subdivided
into several, smaller regions. It all depends on what the researcher wants to study.
In a study on written GJTs, Godfroid et al. (2015) discerned four functional
regions in their test sentences, which centered around the grammatical violation
(see Figure 7.18). The error in the ungrammatical version of the sentence was
labeled the primary interest area (B), because this is where participants are
first expected to slow down (see Section 3.2.1). The primary interest area was
preceded by a sentence-initial region (A),and followed by a spillover region
(C), and a sentence-final region (D). In essence, functional regions like A–D
are interest areas (see Section 6.1) that are custom-drawn by the researcher for
his or her analysis. Because this entails a level of subjectivity, it is a good idea to
cross-check your partitioning of space with that of a colleague. If done well, the
resulting segmentation will be a simpler representation of the stimulus (i.e., a
visual display or text) that retains the information that is important for answer-
ing the research questions.
Once the segmentation is in place, researchers can represent the observed eye-
movement patterns by means of symbol strings. To do so, each region is denoted
by a symbol (see Figure 7.18), and each eye fixation or visit to the region is rep-
resented with the corresponding symbol. Thus, the fixation sequence shown in
Figure 7.19 can be represented as AAAAAABBCDDD, at the level of individual
FIGURE 7.19
Sentence reading pattern with different functional regions
superimposed. This eye-fixation sequence can be represented as
AAAAAABBCDDD, at the level of individual fixations, or ABCD, at
the visit level.
fixations, or it can be further condensed into ABCD, if only visits are of interest.
This approach provides a common metric for describing eye-movement patterns
on otherwise distinct sentences. Godfroid and colleagues (2015) found non-native
speakers produced fewer scanpaths with regression when performing a GJT with
time pressure than without. The researchers argued that the drop in scanpaths
with regressions in the L2 group signaled a reduction in controlled processing
(e.g., Clifton et al., 2007; Reichle et al., 2009) as a result of the time restriction.
Put differently, adding time pressure to a GJT may make it more difficult for
L2 speakers to engage in controlled processing, necessary to access their explicit
knowledge and may force them to rely more on implicit (Ellis, 2005) or automa-
tized explicit (Suzuki & DeKeyser, 2017) knowledge instead.These findings, then,
underscore the importance of timing in the measurement of implicit, automatized
explicit, and explicit, automatized explicit, knowledge. At a methodological level,
Godfroid et al.’s study demonstrated how researchers can use scanpaths to under-
stand the impact of certain task conditions (e.g., timed vs. untimed test conditions)
on participants’ task performance.
Future researchers may wish to expand on this approach by comparing the
scanpath similarity of different trials (symbols strings) directly. Scanpaths are
more similar if they require fewer edits (e.g., fewer insertions, deletions, or sub-
stitutions of symbols) to be matched and, hence, carry a lower transformational
cost to be made equal. String-edit method such as the Levenshtein metric
(Levenshtein, 1966) can be used to match strings automatically or semi-auto-
matically. The outcome of a string-edit comparison will be a large number of
pairwise (string-string) similarity values, which represent the cost involved in
matching two strings. These similarity values can then be analyzed statistically.
Exemplifying this procedure, Von der Malsburg and Vasishth (2011) per-
formed a cluster analysis on the output of their own scanpath similarity algo-
rithm, which they applied to existing reading data from Meseguer, Carreiras,
and Clifton (2002). Using their novel procedure,Von der Malsburg and Vasishth
observed three representative “scanpath signatures” (p. 109) for the reading
of temporarily ambiguous sentences. Interestingly, only one of these scan-
path signatures (i.e., a regression to the beginning of the sentence followed by
rereading) was associated specifically with syntactic reanalysis.Von der Malsburg

and Vasishth concluded L1 readers may be less selective in reanalyzing sentences
than was previously proposed by Meseguer and his colleagues (2002). They
attributed the difference in results to the loss of information that occurs when
researchers perform separate analyses on individual parts of the sentence (i.e.,
analyze eye-movement data for word-based interest areas) rather than consider
the pattern of eye movements across an entire sentence.
In sum, four steps are necessary when performing a scanpath analysis using
symbol strings: (1) segment the visual display into functional regions and assign
a different symbol to each region, (2) represent fixations or visits through the
corresponding symbols, (3) calculate the similarity between different scanpaths
or categorize scanpaths into groups, (4) analyze the data statistically using either
the similarity score or scanpath category membership as a dependent variable. If
done systematically, scanpath analysis has the potential to enrich a range of L2
and bilingualism topics, with displays ranging from simple sentences to highly
complex visual materials.
7.3 Conclusion: What Measures Should I Use?

The comprehensive description of eye-tracking measures in this chapter speaks to
the versatility of eye-tracking methodology (see Section 1.1.3). It is undoubtedly
one of the strengths of this technique that eye-movement records can be analyzed
in many different ways. At the same time, when faced with so many choices, the
question arises as to which measures, of the many options available, one should
use in one’s own study. In many cases, there may be more than one set of measures
that researchers can draw on to address their research questions. Even so, different
options have pros and cons that merit careful consideration in order to develop a
thoughtful approach to data analysis.To conclude, I revisit a number of factors that
can guide researchers in their decision process.
For visual world research, a focus on fixations, as either simple yes/no
events or aggregate proportion data, is standard (see Section 7.2.1.1). Fixation
latency, as measured from a critical point in the audio, provides an additional
way of looking at fixation data (see Section 7.2.1.3). The two variables (binary
events or proportions vs. latencies) make different demands on statistical analy-
sis. Specifically, binary or aggregate fixation events may ask for a growth curve
analysis (see Section 8.5.2), although other statistical options are also available (see
Section 8.5.1), whereas fixation latencies can be analyzed with common statistical
techniques, such as t tests or analysis of variance (see Section 8.3). This is a factor
worth considering when selecting measures. A third measure, fixation dura-
tion (see Section 7.2.1.2) is specific to work on referential processing involving
sentence verification or interpretation (see Section 4.2.3), as well as production
research (see Section 4.2.4); in contrast, fixation duration is rarely found in pre-
diction research, which makes up as much as half of the visual world paradigm
(see Section 4.2.2). Combined, these three fixation-based measures—events or

proportions, latency, and duration—can answer most research questions that have
shaped the field of visual world and production research.
For text-based research, the list of potential dependent variables is larger (see
Figure 7.1); and hence, researchers’ analytic flexibility increases manifold. In the
current synthetic review, four durational measures stood out for their frequency.
The “big four” durational measures were first fixation duration, gaze dura-
tion, total time, and regression path duration (see Section 7.2.1.2). The advan-
tage of using these measures is that there is ample precedent in the literature.
Researchers are more likely to be familiar with them, although you should still
define your measures clearly. While working on this review, I encountered a fair
number of studies in which authors either did not define their measures or pro-
vided definitions that were ambiguous (e.g., second pass versus rereading time).
Improving on this aspect of reporting practices, by including clear and systematic
definitions as provided in the tables above, will help standardize methodological
practices in the field.
By using the same measures as in previous research, researchers can make their
study easier to situate in the literature and more comparable to previous findings.
On the downside, the “big four” measures are like Russian nesting dolls—they
are not independent from each other (see Figure 7.20). When researchers analyze
multiple, correlated eye-movement measures, they are at a higher risk of finding
false-positive results (i.e., significant effects that are not true effects).To control the
false-positive rate, researchers could consider applying a Bonferroni correction to
their tests (Von der Malsburg & Angele, 2017). For instance, when analyzing the
“big four” measures for the same data set, and with α set at. 05, the Bonferroni-
adjusted significance level would be α =.05/4 =.0125.
Another possibility is to steer clear of the non-independence issue by selecting
durational measures that are statistically independent from each other. Examples
FIGURE 7.20
Overlap (non-independence) between three common durational
measures. First fixation duration is a part of gaze duration and gaze
duration is a part of total time. A fourth measure, regression path duration
(not shown here), also correlates with gaze duration.
FIGURE 7.21
Alternative decomposition of a viewing episode into statistically
independent, durational measures. The values for first fixation duration,
refixation duration, and rereading time are unrelated (top panel). First
fixation duration and refixation duration can be subsumed under gaze
duration (bottom panel) if the “early” vs. “late” processing distinction is
of primary interest.
of independent durational measures are first fixation duration, refixation duration,

and rereading time (cf. Alsadoon & Heift, 2015) or gaze duration and rereading time
(cf. Sagarra & Ellis, 2013, for a close alternative). None of these measures overlap
in their temporal properties (see Figure 7.21). Therefore, they can be presented in
stacked bar graphs in research papers or conference presentations (for an example,
see Alsadoon & Heift, 2015), which provide a nice breakdown of total time into
its different subcomponents. When used together, then, these measures will address
when an effect occurred—early, late, or both early and late (see Section 7.2.1.2.2)—
and in so doing, they will shed light on the time course of a process. On the down-
side, refixation duration and, to some extent, rereading time are not in common
usage yet in L2 and bilingualism research. More explanation and interpretation may
then be needed in your writing. Refixation duration and rereading time may also
include a large number of 0s (when interest areas are fixated only once or visited
only during first pass), so it is extra important that researchers check their data dis-
tributions before they run any statistical analyses (see Section 8.2.2).
Finally, it is my hope that the wealth of measures reviewed in this chapter
will entice researchers to try some of them out. Most researchers already include
multiple measures in their analyses, but they still rely heavily on eye fixation dura-
tions. Researchers could explore the value of skips, visits, and regression measures
for their own projects. Heatmaps, gazeplots, and scanpaths may also be informative
in specific research contexts when properly used. A recurring theme in this chap-
ter is that the various measures provide different, and complementary, information
on participants’ processing behavior. For example, proportion of skips could tell
you whether or not focal attention is paid to an interest area at all (see Section
7.2.1.1). A regression count could reveal additional processing or difficulty in
processing (see Section 7.2.2). A mix of commonly used measures and some less
commonly used ones may be a good starting point.
Notes
1 It is possible to compute regression path duration when readers do not regress out of
an interest area upon first pass, but in that case, regression path duration will be the
same as gaze duration.
2 The most common approach to normalizing second pass data is to do a logarithmic
transformation (see Section 8.2.2). In some cases, the high number of 0s in the original
data set remains a concern, even after transformation, and special regression models
such as negative binomial regression (Godfroid et al., 2018), zero-inflated regression, or
gamma regression (Mohamed, 2018) can offer a solution.
3 The most common approach to normalizing rereading times is to do a logarithmic
transformation (see Section 8.2.2). In some cases, the high number of 0s in the original
data set remains a concern, even after transformation, and special regression models
such as negative binomial regression (Godfroid et al., 2018), zero-inflated regression, or
gamma regression (Mohamed, 2018) can offer a solution.
4 Compared to total time, total visit duration additionally includes the durations of any
saccades that were made in between fixations; however, given that saccades are so short,
this typically does not increase values much.
8
DATA CLEANING AND ANALYSIS
This chapter covers the steps between data collection and the reporting of results.
Most eye-tracking researchers in SLA and bilingualism perform inferential sta-
tistical analyses on their data, but to do so, some preparation is necessary. We will
consider data cleaning (Section 8.1) and outlier treatment (Section 8.2) as two
necessary steps in preparing data for analysis. Next, I will provide an overview of
common statistical practices in current eye-tracking research (see Section 8.3).
This overview will set the stage for the remainder of this chapter—an introduc-
tion to two fairly new inferential statistical techniques.
In Section 8.4, I introduce linear mixed-effects models, the fastest growing ana-
lytical technique in L2 and bilingual eye-tracking research. Section 8.5 is devoted
to the time course analysis of eye-tracking data. It includes an extensive introduc-
tion to growth curve analysis (Mirman, 2014). Sections 8.4 and 8.5 will be most
helpful to you if you are already familiar with multiple regression (for general
introductions, see Field, 2018; Gries, 2013; Jeon, 2015; Larson-Hall, 2016; Plonsky
& Ghanbar, 2018; Plonsky & Oswald, 2017). Even so, the general ideas in these
sections are meant to be useful and accessible to all readers. Each analysis section
ends with a concrete example analysis of real eye-tracking data (see Sections 8.4.4
and 8.5.2.5) and a model for how to report the results (see Sections 8.4.5 and
8.5.2.6). To conclude, a roadmap is provided to guide eye-tracking researchers in
their choice of a statistical method (see Section 8.6), taking into account the dif-
ferent possible types of eye-tracking measures.
8.1 Data Cleaning
Data cleaning refers to the steps researchers take in between data collection and
analysis. Data are not clean (i.e., they are “messy” or “noisy”) when they reflect the
252 Data Cleaning and Analysis
influence of outside factors. As a researcher, you can reduce the noise in your data
by creating a well-designed experiment (see Chapters 5 and 6) and by following
best practices for data collection (see Chapter 9).You should always do your best
to collect the best possible, cleanest data you can. Clean data are good data or,
to put it more precisely, data quality is a strong indicator of overall study quality.
Even so, some degree of noise will be inevitable in behavioral research, including
eye tracking. This noise will come from various sources, including technical error
(i.e., from the eye tracker or other equipment) and human error (participant error,
your own error). So given that you cannot get rid of noise completely, how do
you deal with it?
To perform the data cleaning procedure, researchers will likely use one or
more software programs (see Section 8.1.1). With the help of this special soft-
ware, researchers typically (i) inspect individual participant records and trials (see
Section 8.1.2) and (ii) correct them for drift (optional) (see Section 8.1.3), before
they start inspecting their data set for outliers (see Section 8.2). It is important to
keep a master copy of your data throughout this process; that is, keep the original
recordings intact and make sure you work on a copy of your data set.
8.1.1 Data Cleaning Software

Eye-tracking records contain a wealth of information about participant behav-
ior. Managing all that data, however, could be a challenge without the help of
special software programs. To date, most L2 researchers rely on dedicated soft-
ware provided by eye-tracking manufacturers to visualize, clean, and export their
data. SR Research (www.sr.research.com), the manufacturer of the EyeLink, has
developed the DataViewer software program for this purpose. Tobii eye track-
ers (www.tobii.com) come with the software Tobii Pro Studio, which will let
you handle the programming, collecting, and analyzing of your data all in one
place. Lastly, SensoriMotoric Instruments, now purchased by Apple, had analysis
software called BeGaze designed to get data ready for statistical analysis. All these
software programs have excellent manuals that will walk readers through some of
the more technical aspects of data cleaning. Expert technical support is also avail-
able, either free of charge or at a license fee, for EyeLink and Tobii eye trackers
(see Section 9.2.1).
In recent years, eye-tracking solutions are rising that are not tied to any par-
ticular manufacturer. A number of researchers have developed data cleaning
packages in R, a free statistical software environment, to conduct some of the
same cleaning procedures. A list of all the different R packages, compiled by
Braze (2018), can be found at https://github.com/davebraze/FDBeye/wiki/
Researcher-Contributed-Eye-Tracking-Tools.
Some of these packages will require raw eye-tracking data as input. You can
export the raw data (i.e., sample data as recorded by the eye tracker) from con-
temporary eye-tracking software as csv, txt, or xls files. Look for the sample report
Data Cleaning and Analysis 253
in DataViewer, or export timestamp data in Tobii Pro Studio or gaze data (rather
than event data) in BeGaze.
If you are proficient in one program (e.g., R), you may well be able to clean
your raw data entirely with it. For many researchers, however, it will make more
sense to use a combination of programs (e.g., dedicated software and R, SPSS,
or Excel) to conduct different aspects of the data handling and analysis. The key
is to know all the steps that are necessary and determine how you can do them
most efficiently given your experience with the different software programs. In
this section, I intend to provide you with such an overview so you can tackle data
cleaning with confidence.
8.1.2 Inspecting Individual Participant Records and Trials

Soon after data collection is complete, you will want to take a look at your pre-
cious new data set. If you kept a log book during data collection (see Section
9.3.2.2.2), it will prove useful now. First, remove any participants from the data
set that you marked as problematic during the recording. At this stage, you are
looking for gross abnormalities in the data—participants who are going off task,
falling asleep, under the influence, or engaged with their cell phone while doing
the study. In most cases, it should already have been clear during data collection
that this person’s data would not be usable. After the initial screening, you are
ready to inspect the remaining participants’ recordings on a trial-by-trial basis.
Here, you are looking for individual trials in which something went wrong (e.g.,
button pressed too soon, trial interrupted) or things seem off (i.e., an atypical
recording). The following are some examples of trials one should flag during this
stage (Figures 8.1 and 8.3). All examples are authentic data (uncleaned) from a
novel-reading study by Godfroid et al. (2018) (for a review, see Section 3.2.2).
To check data quality, researchers can estimate the amount of track loss in
a trial and in the participant’s data overall. Track loss (i.e., when the eye-tracking
camera loses track of the participant’s eye gaze) is important because, if too
much information in a recording is lost, the data are best discarded. The trial in
Figure 8.1 is a case of suspected track loss. From lines three and four onwards, the
reader’s saccades became very long. This raises the question of whether the eye
tracker was losing the participant’s eye gaze (and hence stopped recording fixa-
tions) or whether the participant was skimming, and not reading, the text. Tobii
will automatically report the percentage of track loss for a given participant.1
In DataViewer, track loss can be spotted visually by plotting the raw data in a
temporal graph (see Figure 8.2). In a temporal graph, the top, horizontal axis
represents time, and the plotted lines show x and y coordinates of the eye gaze in
pixel (i.e., fixation location). Contrary to our initial hypothesis, the graph shows a
normal eye-tracking recording with no blinks or track loss (no thick vertical bars).
Compare this with Figure 8.4 that follows, which depicts data from a problematic
trial. It seems safe to conclude, then, that in the present example, the participant
FIGURE 8.1 Text skimming. The fact that there are no interruptions in the saccade
lines, except at the bottom of the screen, suggests the recording was
normal (no track loss) but the participant was skimming the text.
(Source: Authentic, uncleaned data from Godfroid et al., 2018).
FIGURE 8.2 Temporal graph of the raw data in Figure 8.1 (0–4.8 sec). The eye-
movement recording shows two continuous traces of position
information in screen pixels that are uninterrupted by blinks (compare
with Figure 8.4).
was simply skimming the text. Small amounts of skimming may be a part of natu-
ral reading, especially when reading longer texts. Therefore, whether this trial (or
participant) ought to be excluded will depend on the goals of the study and how
pervasive the skimming behavior was.
A different picture emerges from Figures 8.3 and 8.4, which show eye-tracking
data in a “noisy” trial. The data in Figure 8.3 shows a lot of downward movement,
which is an atypical reading behavior. These vertical lines could be blinks, track
loss, or the eye tracker going wild because it detects two corneas (i.e., split cor-
nea, see Holmqvist et al. [2011]). Again, we need to look at the raw eye-tracking
data in a spreadsheet or a temporal graph to understand what is going on. Both
blinks and track loss will result in missing values for position information. A split
cornea, on the other hand, will produce inconsistent position values, but no miss-
ing data. The data visualization in Figure 8.4 strongly favors an account in terms
of blinks or track loss. In the first 4.8 seconds of the trial alone, there are four large
vertical bars, and this pattern will repeat itself throughout the trial (compare with
the previous Figure 8.2). The vertical bars are time intervals when the position
information was not available. Because the interruptions are fairly short (< 100 ms),
these were likely blinks and not track loss, although from a technical perspective,
the distinction does not matter because both result in missing information.
Unlike in some studies with event-related potentials, participants in eye-
tracking experiments are not commonly instructed to suppress their blinks.
FIGURE 8.3 Blinks in an eye-movement record. Every blink is flanked by two signature

saccades (downward and upward lines) as the eyelids first cover and then
uncover more and more of the pupil.
(Source: Authentic, uncleaned data from Godfroid et al., 2018, not used for analysis).
FIGURE 8.4 Temporal graph of the raw data in Figure 8.3 (0–4.8 sec). The eye-
movement recording shows two traces of eye position information in
screen pixels that are frequently interrupted by blinks (compare with
Figure 8.2).
Asking participants to suppress blinks during reading may actually have the oppo-
site effect and cause participants to start blinking more. In any event, blinks are an
artifact in eye-movement data; their impact on data quality merits careful assess-
ment (see Section 9.3.2.2 for tips to minimize blinking during data collection). If
there are many blinks, as in the current sample trial, it is better to discard that trial.
In other cases, blinks can be deleted from the record.The good news is that blinks
will not affect your calculation of fixation duration measures (see Section 7.2.1.2)
because blinks are contained in two artificial saccades (the downward and upward
lines in Figure 8.3). Therefore, the manufacturer software will automatically filter
the blinks out of any calculations of eye fixation duration.
In sum, researchers can perform different checks to ensure eye data qual-
ity (Mulvey et al., 2018), including checking for track loss. Proper training and
practice operating the eye tracker (see Section 9.3.2.2.2) will go a long way in
reducing track loss. Similarly, track loss can be preempted by proper experimental
design. Mixed, computer- and paper-based designs, in which participants need
to look away from the screen, for instance to read or write something on paper,
are not recommended, because these designs will inherently induce track loss.
Researchers can further take ownership of their data quality by inspecting their
raw data, as illustrated in this section. Holmqvist et al. (2011) reported a typical
data loss of 2–5% with trained eye-tracker operators for an average population of
Europeans who were not prescreened. These levels will vary as a result of techni-
cal, human (participant and operator), and task design factors (for further discus-
sion, see Section 9.3.2.2).
For greater transparency, the amount of missing data and participant exclusions
ought to be reported in research articles, and this information should be broken
down by participant group and condition. To maintain adequate statistical power
(see Section 5.5), researchers will need to recruit new participants to replace any
excluded individuals’ data. As these practices become ingrained in our field, the
eye-tracking community will be able to evaluate what are typical and acceptable
levels of track loss in SLA and bilingualism. Most importantly, researchers and read-
ers will be more assured that the collected eye-tracking data are valid and complete.
8.1.3 Correcting for Drift

After inspecting the data in a trial-by-trial manner, the second step in data clean-
ing is to identify trials with drift in order to make adjustments or discard those
trials. Drift refers to a systematic offset between the recorded eye gaze location
and a participant’s true eye gaze location. If the participant did not look at the
recorded position, this presents an obvious threat to the internal validity of a study.
Proper camera set-up and calibration are essential to minimize drift and, therefore,
the first approach to drift correction should be to preempt it by honing your
data collection skills (see Section 9.3.2.2). Vertical drift can be easy to detect in
reading research when eye fixations are floating systematically above or below a
line of text, because readers tend to look at words, and not above or below them,
when they read. Spotting vertical drift will be easier if you have double- or triple-
spaced your text, following the guidelines for text-based eye-tracking research
(see Section 6.2). When using single-spaced text or complex visual displays (e.g.,
website interfaces, online dictionary entries, drawings), or when drift is severe,
researchers may not be able to identify the targets of fixations confidently. In that
case, the only safe option is to exclude trials with drift from further analysis (also
see Section 9.1.3).
Options to correct for drift range from manual to almost entirely automatic.
In Data Viewer (the data processing program associated with the EyeLink), fixa-
tion locations can be adjusted manually or semi-automatically, for one or multiple
fixations at a time. Open software with similar functionalities is available from the
University of Massachusetts Eye-Tracking Lab (http://blogs.umass.edu/eyelab/
software/). Because drift is a systematic problem, multiple fixations will typically
be corrected at once. Figure 8.5 shows a trial before and after drift correction
was performed in three ways—manually, with the in-built Drift Correct function
in DataViewer, or using a combination of both. For each line of text, I selected
the corresponding eye fixations. I then pulled the fixations down manually (see
Figure 8.5b) or used the in-built Drift Correct function (see Figure 8.5c). Drift
Correct will align all fixations vertically to their average pixel height in screen
FIGURE 8.5 (a) Verticaldrift in eye-movement recording and corresponding data set

after data cleaning: (b) manual adjustment, (c) semi-automatic cleaning
with Drift Correct, and (d) Drift Correct followed by manual adjustment.
The cleaned data are assumed to reflect the true locations of the eye more
closely.
coordinates. In the current example, the average position is above the text line;
therefore, in Figure 8.5d I brought them down further by hand. With manual
correction, all fixations are typically corrected the same amount, so that small
fluctuations in pixel height may persist, but the researcher can determine how far
fixations are moved up or down.
For a more efficient approach that does not involve human judgment, Cohen
(2013) wrote an R function, Fix_Align.R, that can do the data cleaning for you.
If you have large amounts of data to clean, this program could be a life-saver.
Fix_Align.R uses linear regression to assign individual fixations to a text line
(in a multi-line experiment) and removes or flags outliers and ambiguous fixa-
tions based on the regression analysis. The best-fitting regression line, and hence
the program’s cleaning solution, is the one that maximizes the likelihood of the
regression line, given the recorded eye fixation locations. Cohen (2013) reported
near-perfect classification agreement (99.78% agreement) between the software
and an experienced eye-tracking researcher who cleaned the same data manually.
To evaluate the quality of the automated cleaning procedure for their own data,
researchers can use the trial_plots argument in R.
To minimize the need for post-hoc adjustments, it is good practice to make
your interest areas large enough so they can absorb small amounts of drift (see
Section 6.1). Specifically, by including an extra buffer in interest areas, you will
be able to account for human and technical (eye tracker) error in eye-move-
ment registration. Buffers around images or objects in images are a case in point:
see Figures 6.9, 6.11, and 6.12, for examples from visual world and production
research. In text-based research, work on glosses (i.e., translations or paraphrases of
difficult words in a text) offers a particularly salient example of why interest areas
matter. Marginal glosses are a clear, stand-alone target during reading. As can be
seen in Figure 8.6, the glosses are reached from the text via long-distance saccades,
which tend to be more error prone.Therefore, any saccades directed at the general
area of the gloss are likely intended for the gloss, even though the eye tracker may
not register them as actually landing on the gloss. These various sources of error,
then, can be accounted for in the design or analysis of the study, by making the
interest areas around the glosses a bit larger.
In sum, the availability of partially and fully automatic data cleaning procedures
has made this step of the research process more engaging, or even fun. Researchers
can now choose to correct drift manually, by moving fixations up or down, or
automatically, with the help of special software functions or code. Open source
software may offer an alternative for researchers whose default software program
lacks a drift correction function. Information on drift correction—whether and
how you fixed small amounts of drift in your data—should be a part of your
research article.
FIGURE 8.6 Reading data for an L2 English speaker reading a text with marginal

glosses. The slightly larger interest areas (boxes) around the glosses can
account for technical or human error in recorded eye fixations.
(Source: Data supplied by Dr. Carolina Bernales, Pontificia Universidad Católica de Valparaíso, Chile).
8.2 Dealing with Outliers

With the quality of the data recording ascertained, researchers can move on to
the next stage of data cleaning, which is to deal with outlier fixations. Outliers
are abnormal values that are unrelated to the phenomenon of interest in a given
study (Barnett & Lewis, 1994; Lachaud & Renaud, 2011; Ratcliff, 1993). In
statistical terms, outliers do not belong to the same data distribution as the
remainder of the data collected for a study. Outliers can be short or long, small
or large. It may be possible to define outliers based on their absolute value (i.e.,
relative to a prespecified cutoff value) or because the data points fall at the tail
end of a distribution. In most cases, researchers will want to do something about
the outliers in their data set because outliers may undermine the conceptual and
statistical validity of the study. In this section, I will walk you through a four-
stage procedure for dealing with outliers, which is represented visually
in Figure 8.7. Screening your data for outliers, as described in this section, will
improve the fit of your statistical model. As a result, you will be able to interpret
your statistical results with more confidence, knowing they are based on a well-
fitting statistical model.
FIGURE 8.7 A four-stage procedure for dealing with outliers.

8.2.1 Dealing with Overly Short and Long Fixations

Some fixations require special attention because they have biologically implau-
sible values. For example, it is believed that 50 ms is the eye-to-brain lag; that
is, the minimal duration a reader must look at a stimulus to extract useful visual
information for processing (Inhoff & Radach, 1998). Likewise, mean fixation
durations in skilled, L1 reading range from 225 ms to 250 ms (Rayner, 2009),
with comparable benchmark values for L2 reading shifted upwards (though by
how much is an important area for future research). These numbers, then, pro-
vide us with some general estimates of what plausible or typical eye fixation
durations are.
In a frequency analysis of L1 reading data, Rayner (1998) found that only 3%
of all fixations were between 50 and 100 ms and less than 1% of fixations were
longer than 500 ms. For the lower end, researchers tend to agree that short fixa-
tions do not reflect cognitive processing, but other events, such as the presence of
a microsaccade (see Section 2.2), a truncated fixation at the beginning or end of a
trial, or a blink or other artifact in the eye-movement data (see Section 8.1.2). On
this view, it is better to merge or remove short fixations in applied research,
because they do not reflect what most applied eye-tracking researchers are inter-
ested in (i.e., cognition).
The precise steps for merging or deleting fixations are detailed in manufacturer
manuals (see Section 8.1.1). The general idea is that a very short fixation will be
merged with another, longer fixation that is in its vicinity (e.g., within 0.5° of
visual angle) or else, will be deleted. Short fixations can be handled automati-
cally as a part of data parsing algorithms (Tobii) or cleaned through a four-stage,
automatic cleaning procedure in DataViewer (SR Research). Both manufacturers
provide default duration thresholds for merging and deleting, which researchers
can customize for their own research projects.
As regards very long fixations, deletion (but not merger) is also common.
Eye-tracking researchers generally remove fixations longer than 800 ms, as rec-
ommended by Conklin and Pellicer-Sánchez (2016) and Keating (2014), because
these fixations are believed to signal a lapse in attention. The practice of remov-
ing very long fixations was adopted from L1 reading research. Given that there
are no benchmarks for L2 reading yet, it is worth reevaluating the validity of the
proposed cutoff value (i.e., > 800 ms) in an L2 context. After all, L2 readers tend
to read more slowly than L1 readers; therefore, the distribution of their fixation
times may well extend higher upwards.
To visualize the data distribution in a study, researchers can inspect the his-
tograms of fixation durations in their project. If a study has multiple tasks or
participant groups, data from each task or group should be plotted in a sepa-
rate histogram. The procedure is illustrated here with data from two of my own
research collaborations. Godfroid et al.’s (2018) study compared data from native
and advanced non-native English speakers (see Section 3.2.2). Because of the
L2 speakers’ advanced proficiency level, differences between the native and non-
native speakers are expected to be smaller. In contrast, participants in Godfroid and
Uggen (2013) were beginning learners of German, who had had only 3.5 weeks
of college instruction (see Section 3.2.1). All groups engaged in natural reading
of level-appropriate materials—an English-language novel for the advanced and
native English speakers and simple sentences for the beginning German learners.
Figure 8.8 presents three histograms for these respective data sets.
What stands out in these histograms is the similarity in distributions. Mean
fixation duration increases somewhat as proficiency level goes down, as would be
expected, and the curves become slightly flatter. However, overall the shape remains
very similar across the three groups. Of note, the right tail of the distribution does
FIGURE 8.8 Frequency distribution for fixation durations during sentence reading: (a)

native speakers, (b) advanced L2 speakers, and (c) beginning L2 speakers.
not reveal a noticeable increase in the number of overly long fixations among L2
speakers. Fixations longer than 800 ms accounted for 0.1% of all native speakers’
fixations, 0.2% of all advanced non-native speakers’ data, and 1% of the begin-
ning learners’ data. These numbers align with Rayner’s (1998) previous estimates.
Therefore, the 800 ms cutoff value for normal viewing behavior may extend
to L2 reading research including participants at different L2 proficiency levels. To
assess the proposed cutoff in the context of their own studies, and to determine a
good lower duration threshold as well, researchers could plot the data from their
own studies in a similar manner.
8.2.2 Data Transformation
Like the general class of reaction time (RT) data, eye-fixation durations and laten-
cies tend to be skewed.They are not normally distributed, but tend to have a long
tail on the right (for examples, see Figure 8.8), due to a small number of obser-
vations with relatively large values. As Whelan (2008) noted, “using hypothesis
tests on data that are skewed, contain outliers, are heteroscedastic [have unequal
variances], or have a combination of these characteristics … reduces the power
of these tests” (p. 477, my addition). Therefore, eye-tracking researchers, like RT
researchers, need to address skew in their data in order to satisfy the normality
assumption of parametric tests and safeguard statistical power.
A common way to address the skewness problem is by performing a logarith-
mic transformation on the data; that is, to create a new variable X’ that equals
the logarithm of the original variable: X’ = log(X). A logarithmic transformation
will reduce high values more strongly than lower values: logb(x) = y ⇔ by = x. And
this is exactly what is needed with right-skewed data. Thus, the new, log-trans-
formed variable will approximate normality more closely and it is this variable,
log(X), that should be used for statistical analysis.2
Although the normality assumption (i.e., the assumption that the depend-
ent variable is normally distributed) is central to parametric statistics, it has not
always been checked consistently in eye-tracking research. To examine this issue
in more detail, I inspected all the studies that included eye fixation duration or
latency measures in my sample. Specifically, a research assistant and I coded (i)
whether the researchers reported transforming their variables and, if not, (ii)
whether they confirmed a data transformation was not necessary because the data
were normally distributed. Perhaps reassuringly, no authors reported their dura-
tion or latency measures were normally distributed (as mentioned, eye-tracking
data are generally skewed). In spite of the apparent non-normality of the data,
however, only 26% of researchers reported performing a logarithmic or other
transformation. It is possible that some of the remaining 74% of researchers actu-
ally transformed their data, but failed to report it, which would be less than ideal
in terms of research transparency and could potentially make interpretation of
findings more difficult. Even so, the present numbers suggest that violations of
normality are widespread in the field of eye-tracking research, with damaging

consequences for the validity of the statistical analysis (cf. Whelan, 2008). In the
remainder of this section, I explain how researchers can check the normality of
their data and do a log transformation, if necessary, to ensure that a key assumption
of parametric statistics is met.
The first step is to determine whether your data require a transformation. To
do that, you can plot your data in a histogram (see Figure 8.8) or a boxplot and
look for skew or asymmetry in the data distribution. Chances are that if you are
using a duration or a latency measure, your data will require a transformation. In
eye-tracking research, the logarithmic transformation is by far the most common
transformation, even though in general RT research the inverse transformation
( )
X ′ = X1 is used frequently as well (e.g., Baayen & Milin, 2010; Ratcliff, 1993;
Whelan, 2008). In either case, researchers need to be aware of observations that
have a value of zero (e.g., skips). Both logarithm zero (log 0) and one divided by
( )
zero 01 are undefined in mathematics and thus will yield missing values in the
data set. To solve this problem, researchers can add a small positive amount (e.g.,
+1) to all observations to get rid of the zeros. Another point to note is that the
logarithm of a negative number is also undefined. (Negative numbers can occur
in eye tracking when calculating difference scores or ΔOE, see Section 7.2.1.2)
Applying the same logic as before, researchers can first identify the smallest value
in the variable (e.g., −239) before adding “1” or another small amount plus the
absolute value of this observation (e.g., 1 + 239 = 240) to all rows. That way, the
smallest value in your data will be neither zero nor a negative value.
After you do a logarithmic transformation, it is a good idea to check the new
data distribution to verify whether it had the desired effect. Following a success-
ful transformation, the data histogram will approximate a bell-shaped curve more
closely and the corresponding skewness and kurtosis values will be closer to zero.
Although visual inspection of graphs should be your primary source of evidence,
skewness values below 2 or even 1 typically indicate good symmetry (Larson-
Hall, 2016). Figure 8.9 presents a histogram before and after data transformation,
together with the corresponding Q–Q (quantile–quantile) plots for the raw and
transformed data.Whereas the raw eye-tracking data show the characteristic right
tail, indicative of positive skew, the log-transformed data are distributed more
symmetrically around the mean value. The same information is reflected in the
Q–Q plots, where the log-transformed data points follow the line of the lognor-
mal distribution closely.
In short, by taking the logarithm of a duration or a latency measure, research-
ers can take a first step toward addressing positive outliers in their data set.
Transforming data will also help satisfy the normality assumption of parametric
statistics and will thus open up possibilities for statistical analysis. Whether data
transformation was successful, however, needs to be verified independently for
each data set.
FIGURE 8.9 Typical distribution of total reading times before transformation (left

panel) and after log transformation (right panel). The bottom row
indicates fit of the two data sets to a hypothesized normal distribution
and log normal distribution, respectively.
8.2.3 Accounting for Outliers: Model Criticism

or Aggressive A Priori Screening?
Traditionally, outliers have been identified prior to data analysis, as a part of mak-
ing the data spreadsheet ready for analysis. Recall that outliers require special
treatment because it is believed that they do not reflect the phenomenon that is
of interest in a given study (Barnett & Lewis, 1994; Lachaud & Renaud, 2011;
Ratcliff, 1993). In Section 8.2.1, I presented guidelines for removing biologically
implausible values from eye-movement data. Such mild a priori data screening is
always a good idea. Here, we turn to the question of whether any additional data
screening is needed before data analysis other than to remove fixations < 50 ms or
< 100 ms and fixations > 800 ms.
Most researchers think it is important to perform additional aggressive data
cleaning before analysis. However, it is possible to wait until after analysis to iden-
tify outliers. Specifically, statistical analyses such as analysis of variance (ANOVA),
regression, and linear mixed-effects models (LMMs) provide researchers with tools
to identify outliers after a statistical model has been fit. This approach is known
as model criticism (Baayen & Milin, 2010). It has the advantage of potentially
leaving a larger portion of the data set intact, while still improving model fit. I will
discuss both options here.
Before attempting any outlier detection, researchers need to ensure that their
data are in the distribution they wish to use for their data analysis (see Section
8.2.2). An outlier could well be a “normal citizen” (Baayen & Milin, 2010, p.
16) after data transformation and this would make any further steps unnecessary.
Once any appropriate data transformations (e.g., logarithmic transformation)
have been performed, researchers who engage in a priori data cleaning can
choose from a range of different options to identify outliers (see Textbox 8.1).
These approaches differ in how many data points are affected and how the trim-
ming will affect the power of the statistical analysis (see Ratcliff, 1993). Common
practice in L2 and bilingualism research is for researchers to set a threshold—
typically 2, 2.5, or 3 standard deviations (SDs) above or below the mean—past
which an observation is considered an outlier. Because reading speed is highly
individual, and will differ both between individuals and between items, mean
values and SDs should be calculated at the level of individual participants and
items, rather than at the group level (Lachaud & Renaud, 2011). This, of course,
presupposes there are enough observations per participant and per item to calcu-
late a meaningful SD (see Section 5.5). If SDs are computed for the grand mean
(i.e., for all participants and items combined), data from slow individuals and
difficult items will be truncated disproportionately, while outliers in compara-
tively fast individuals and easier items will go undetected more often. Therefore,
it is better to avoid using the grand mean. An intermediary solution (if SDs for
individual participants and individual items are prohibitively large) is to use the
mean per condition instead. Lastly, it is worth noting that L2 eye-tracking data
are inherently more variable (i.e., L2 users tend to have larger SDs). Therefore,
even with cutoff values at mean ± 2 SD, a wide range of values will still be con-
sidered “normal”.
When outliers have been identified, researchers can either trim (delete, trun-
cate, eliminate) these observations or replace them by their corresponding cut-
off value. The process of replacing outliers is known as winsorization (Barnett
& Lewis, 1994; Wilcox, 2012). Compared to outlier deletion, which results in
a reduced data set, winsorization preserves more information. Researchers can
again set the window over which they want to winsorize their data, similarly to
what they would do for outlier deletion. For instance, in a 0.10 winsorization, all
values below the fifth percentile or above the 95th percentile are replaced by the
value at the fifth percentile or the 95th percentile, respectively.
EXTBOX 8.1. HOW TO DEAL WITH OUTLIERS IN A

T
DATA SET
1. A priori data screening*
This is by far the most common approach, but it tends to result in more
data loss or data manipulation. To deal with outliers before analysis,
researchers can follow these three steps:
1. Calculate means and SDs
• Use means by participants and by items (default), or
• Use means by condition (if the number of observations per par-
ticipant or item is small)
2. Set range
• Mean ± 2 SD (most conservative, this will affect the most data
points)
• Mean ± 2.5 SD
• Mean ± 3 SD (most liberal, this will affect the fewest data points)
3. Delete or pull in (winsorize) outliers
• Outlier trimming: Delete observations beyond mean ± 2 / 2.5 /
3 SD
• Winsorizing: Replace observations beyond mean ± 2 / 2.5 / 3 SD
by the corresponding cutoff value
2. Model criticism combined with mild initial trimming (Baayen
& Milin, 2010)*
1. Fit a statistical model (e.g., ANOVA, regression, LMM).
2. Save the standardized residuals (z scores).
3. Identify observations for which |z| > 2, |z| > 2.5, or |z| > 3.
4. Remove those observations that have outlying residuals from the
data set.
5. Refit the same statistical model as in step #1 as the final analysis.
* Always remember to remove biologically implausible values first.
As previously mentioned, blindly deleting or replacing a set of values becomes

unnecessary if researchers opt for an a posteriori approach informed by model
criticism. Model criticism involves inspecting the residuals for a given statistical
model after the model has been fit (for other complementary measures of influ-
ence, see Nieuwenhuis, te Grotenhuis, & Pelzer, 2012). Residuals tell us which
observations are poorly predicted or accounted for by the statistical analysis. They
are the prediction error in the model for each observation. Even when data points
are far away from the mean (candidates for deletion in an a priori approach), a sta-
tistical analysis may still be able to account for them well (small residuals). In that
case, trimming or winsorizing was not necessary. Hence, model criticism dictates
that only the observations with large residuals should be trimmed.
Although Baayen and Milin (2010) discussed model criticism in the context
of LMMs, this approach is not specific to mixed-effects modeling. Researchers
can also save residuals in ANOVA or linear regression (for more information,
see Field, 2018) and then follow the same steps. To engage in model criticism,
researchers will first fit their statistical model (e.g., ANOVA, regression, LMM)
to the data and save the standardized residuals (for more information, see Field,
2018).With this information, researchers can identify outliers as those data points
with absolute standardized residuals exceeding 2 SD (2 < |z|), 2.5 SD (2.5 <
|z|), or 3 SD (3 < |z|); it is up to the individual researcher to decide how strict
or lenient she wants to be (see Textbox 8.1). It is assumed that the residuals of a
parametric statistical analysis will be normally distributed. Therefore, a suitable
model will have about 5% standardized residuals with absolute values larger than
2, 1.2% residuals with absolute values larger than 2.5, and 0.27% residuals with
absolute values larger than 3. Standardized residuals outside the proposed range
should be removed from analysis. For example, in the residual scatterplot in
Figure 8.10, any data points outside ± 2.5 SD will be deleted before running the
analysis again.
FIGURE 8.10 Residuals (error terms) for a reanalysis of Godfroid and Uggen’s (2013)

data. In the current example, any data point beyond ± 2.5 SD from the
mean (indicated by the gray dotted lines) is considered an outlier.
After researchers remove the outliers, they will rerun the same analysis on the
trimmed data set and compare the results with the original analysis. This part
(comparing the two analyses) is shared by all researchers—those who clean their
data a priori and those who engage in model criticism. The comparison is called a
sensitivity analysis (Lachaud & Renaud, 2011) because it is designed to test to
what extent the results of the statistical analysis are sensitive to the chosen clean-
ing procedure (Ratcliff, 1993).
The sensitivity analysis can reveal that results (i) remain the same, (ii) gain, or
(iii) lose statistical significance after data cleaning. Baayen and Milin (2010) argued
that in all three cases, the results from the model post criticism are the more reli-
able ones. In their words, only the final analysis reveals an effect (or the lack of an
effect) “that is actually supported by the majority of data points” (p. 26).
Ratcliff (1993), in a seminal overview of a priori cleaning procedures, con-
cluded that a result should be replicable across a range of different cutoff values.
When the results do not converge, further investigation of the original dataset is
necessary to understand what may be causing the observed differences (Lachaud
& Renaud, 2011). One possibility is that outlier treatment has truncated some
distributions (e.g., by condition) in the data. Specifically, if an experimental effect
is observed only in the longer fixation times (i.e., the “tail” of the distribution),
then outlier treatment may cause a true effect to disappear in the statistical analy-
sis (Ratcliff, 1993; Whelan, 2008). Again, these concerns apply only to a priori
cleaning, which is performed on the raw observations, rather than the residuals.
The takeaway point, then, is that researchers need to report their findings for the
sensitivity analysis and, for a priori approaches, describe potential actions taken to
handle discrepancies that may arise from different cleaning procedures (e.g., trim-
ming or winsorizing at 2, 2.5, or 3 SD).
To illustrate, I will demonstrate the impact of four outlier treatment strategies
on the statistical modeling of some previously published data (Godfroid & Uggen,
2013). I applied four broad strategies—no trimming, mild trimming of biologi-
cally implausible values only, mild trimming combined with model criticism,
and aggressive a priori trimming—to the original dataset. To perform a sensitivity
analysis, I evaluated each strategy, first, in terms of the number of observations that
needed to be removed and, second, overall model fit (R2 value, which represents
the amount of variance explained). A third index, the significance levels for the
independent variables in the model, will be presented in Table 8.4 that follows
after the general introduction to LMMs. Together, these three indices (number of
observations, model fit, and significance levels) enable us to assess the impact of
the different cleaning procedures.
The original dataset (i.e., no trimming) contained a total of 946 observa-
tions. When I fit a mixed-effects model (for detailed discussion of the model
specifications, see Section 8.4), the conditional R2 value was 0.086. These two
numbers will serve as the baseline for the present sensitivity analysis. Note that
the dependent variable for this analysis was a difference score (i.e., Total Time
difference), which was normally distributed. Hence, no data transformation was

needed, but this is seldom the case if you work with durational measures. In a
first step, I removed biologically implausible values from the data set (see Section
8.2.1). Short fixations were merged automatically in the eye-tracking software.
Additionally, I removed any overly long fixations (> 800 ms) manually. These
80 long readings times were an indication that other processes, unrelated to the
learning processes Godfroid and Uggen wanted to study, may have been involved.
I then fit another model to this second data set (i.e., without the 80 long read-
ing times). As can be seen in Table 8.1, model fit improved, but not by a lot (i.e.,
the R2 value increased from 0.086 to 0.091). From this second model, I saved the
residual (prediction error) for each observation and standardized them. A scat-
terplot of the residuals, grouped by the three independent variables, was previ-
ously shown in Figure 8.10. As a third step, I decided to remove the observations
that had an absolute residual value larger than 2.5 SD (beyond the gray dotted
lines in Figure 8.10). This resulted in the removal of another 23 observations.
Removing these data points improved the model fit substantially, from R2 = 0.091
to R2 = 0.16. In a research paper, these are the results that you would report for the
statistical analysis. Finally, I also attempted the more traditional outlier treatment,
which was termed aggressive a priori trimming in Baayen and Milin’s (2010) paper.
In particular, from the mildly trimmed data set, I removed a total of 22 observa-
tions beyond 2.5 SD away from the item means and the participants means for
each condition separately. Although in the present example, a priori trimming and
model criticism resulted in a similar number of outlier deletions, suggesting the
two strategies were in fact equally aggressive, the larger R2 value for the a posteriori
TABLE 8.1 Comparison of different outlier treatment strategies
Treatment strategies Specific actions taken No. of No. of Conditional

observations observations R2
filtered remaining
None (full data None 0 946 0.086
set)
Mild trimming Observations < 100 ms and 80 866 0.091
> 800 ms were removed
Mild trimming Observations which had a 103 843 0.16
followed by standardized residual larger
removal based than ±2.5 SD were removed
on model
criticism
Aggressive Observations beyond 2.5 SD 102 844 0.12
a priori away from the item means
trimming and the participants means
for each condition separately
were removed
FIGURE 8.11 –Q (quantile–quantile) plots showing model fit following different

Q
outlier treatment strategies.The data points increasingly follow a straight
line, reflecting a closer match with the normal distribution.
approach supported the use of model criticism. Figure 8.11 represents visually

how the data increasingly fit a normal distribution as more and more outliers are
removed from the statistical analysis.
8.3 Overview of Statistical Practices in

Current Eye-Tracking Research
When data have been cleaned and prepared for analysis, most eye-tracking
researchers will perform one or more statistical tests to evaluate the effects of
their experimental manipulation. In this section, I provide an overview of sta-
tistical procedures L2 and bilingualism researchers have performed on their eye-
tracking data. This summary of current statistical approaches is a starting point
for researchers seeking guidance on what analysis to use; that is, once you have
decided on your set of dependent variables (see Chapter 7), Figures 8.12, 8.14,
and 8.15 provide a menu of what statistical options are out there. The overview
is also helpful for discerning trends in statistical practices over time (see Figure
8.13). In particular, we can make predictions about what techniques are likely
to dominate the future based on the past and the present and, thus, invest some
time in learning those. The second half of this chapter will reflect these goals: two
more recent techniques—LMMs (see Section 8.4) and growth curve analysis (see
Section 8.5)—will be introduced and applied to real-world example studies.
When it comes to choosing a statistical procedure, the nature of your depend-

ent variable is key (for a refresher of the concepts of dependent and independ-
ent variable, see Section 5.1 and Textbox 5.1). The present overview focuses on
statistical procedures in which the eye-tracking measure served as the depend-
ent variable. This is by far the most common scenario, although eye-tracking
measures do sometimes function as independent variables as well (see especially
Godfroid & Spino, 2015; McCray & Brunfaut, 2018; McDonough, Crowther,
Kielstra, & Trofimovich, 2015; McDonough, Trofimovich, Dao, & Dion, 2017).
For present purposes, analyses are grouped together and presented by superordi-
nate measurement categories: fixations, regressions, and integrated measures (see
Chapter 7). Following the classification scheme in Chapter 7, I further divided
the fixation category into four subtypes: fixation count/probability/proportion,
fixation duration, fixation latency, and fixation location (see Figures 7.1 and 7.2).
Readers can refer back to different parts of Chapter 7 for detailed descriptions
of all of these measurement types. For the present overview, I recorded for each
study what eye-movement measure(s) were analyzed and what statistical tech-
niques were used to do so. If researchers analyzed multiple types of measures
(e.g., fixation durations and regressions), I counted each measure-analysis pair as
a separate instance. Therefore, one study could have multiple rows in the data set.
Figures 8.12, 8.14, and 8.15 present the results for text-based eye tracking and for
the visual world paradigm, including production research. Because some trends
are visible most clearly in the visual world paradigm, we will begin by reviewing
FIGURE 8.12 Statistical procedures in visual world and production research. Note:

SEM = structural equation model; CFA = confirmatory factor analysis.
FIGURE 8.13 hanging trends in statistical practices in L2 and bilingualism eye-

C
tracking research: (generalized) linear mixed-effects models are
catching up with ANOVA.
FIGURE 8.14 Statistical procedures in text-based research. Note: analyses of eye-fixation

duration measures are represented in a separate pie chart (see Figure
8.15) due to scale differences; SEM = structural equation modeling.
those (see Figure 8.12). This will then set the stage for the larger and more varied
body of eye-tracking studies with text (see Figures 8.14 and 8.15).
Over half of the visual world studies published to date have relied on ANOVA
for data analysis. This predilection for ANOVA holds true, regardless of the type
of outcome measure used. Studies with durational measures, latency measures, and
counts, proportions, and probabilities are all highly ANOVA-based, even though
this is now rapidly changing (see the following). In fact, the use of ANOVA for all
these measurement types is not without potential concern. As detailed in Section
8.4.1, counts, probabilities, and proportion data should not be subject to ANOVA
(Jaeger, 2008). Alternative techniques such as logistic and quasi-logistic regression
FIGURE 8.15 Analyses of eye-fixation duration measures in text-based eye-tracking

research. Note: SEM = structural equation modeling.
are better suited for capturing the properties of binary or proportional data.These
techniques (logistic and quasi-logistic regression) will be described in Section
8.5.2.3. On the other hand, fixation duration and fixation latency measures are
compatible with ANOVA, provided the data are normally distributed. This will
usually require doing a logarithmic data transformation first to normalize the data
(see Section 8.2.2), but unfortunately many researchers either omit information
about data transformation or assumption checking from their papers or fail to
transform their data altogether. Use of ANOVA in eye-tracking research, there-
fore, requires knowing the properties of your data first and making sure the data
fulfill ANOVA’s statistical assumptions.
The centrality of ANOVA notwithstanding, recent years have seen a sharp
increase in the use of linear mixed-effects models (LMM) and generalized
linear mixed-effects models (GLMM): see Figure 8.13. LMMs and GLMMs are
both extensions of the linear model (LM) that lies at the basis of linear regression and
ANOVA. LMMs and GLMMs owe their name—mixed-effects model—to the fact
that they can accommodate a mix of fixed and random variables (see Section 8.4.2).
If you are familiar with multiple regression (see Field, 2018; Jeon, 2015; Larson-Hall,
2016; Plonsky & Ghanbar, 2018; Plonsky & Oswald, 2017, for good introductions),
LMMs may be the logical next step in your statistics journey (see Section 8.4).
LMMs inherit all of the LM’s advantages, such as the simultaneous handling of
multiple independent variables, both continuous and categorical (see Section 8.4.2).
The main novelty lies in determining what random effects structure to use and
how to interpret these random effects (see Section 8.4.3). We will review different
approaches for fitting random effects structures in Section 8.4.3 and consider why it
may be worth investing in this new technique in Sections 8.4.1 and 8.4.2.
In text-based eye tracking, the variety of dependent variables goes hand in

hand with a variety of statistical approaches (see Figures 8.14 and 8.15). With
the exception of integrated measures (i.e., scanpaths and gazeplots), ANOVA is
the most commonly used technique across all the dependent variables, mirroring
results for the visual world paradigm. Additionally, text-based eye tracking has
been the site of more basic statistical approaches, including non-parametric statis-
tics (Kruskal-Wallis, Mann-Whitney U tests), t tests, correlations, or, in few cases,
descriptive analysis only. The rise of LMMs and GLMMs, noted for the visual
world paradigm, is also apparent here (see Figure 8.13). As previously noted, the
use of ANOVA with fixation counts, probabilities, proportions and non-trans-
formed measures of eye fixation duration is problematic on conceptual and sta-
tistical grounds (see Section 8.5.2.3). Importantly, the same argument extends to
regressive eye movements, for which the use of ANOVA, though quite common,
is not appropriate either. One solution, described in Section 8.5.2.3, is to perform
an empirical logit (elog) transformation that will turn the bounded eye-tracking
measure into an unbounded, continuous variable. This will open the door to
ANOVA, regression, or any other parametric analysis. Going one step further,
researchers could opt for a generalized linear model or generalized linear mixed-
effects model to analyze their binary or count data. An example is binary logistic
regression; this will be discussed in more detail in Section 8.5.2.3.
8.4 Linear Mixed-Effects Models

8.4.1 What’s Wrong with Repeated-Measures ANOVA?
Traditionally, researchers have used analysis of variance (ANOVA) to detect dif-
ferences in mean values between conditions (see Figures 8.12, 8.14, and 8.15).
When running a repeated-measures ANOVA (i.e., an ANOVA with at least one
within-subjects variable), researchers typically organize their datasets in two ways,
by subject and by item (see Figure 8.16). This will let them conduct separate
analyses by subject (F1 analysis) and by item (F2 analysis). The idea behind
these analyses is that data are aggregated (averaged) over one of the two vari-
ables. As seen in Figure 8.16, in an F1 analysis every participant has one value per
condition, which represents the average for that participant over all items in that
condition. Similarly, in an F2 analysis, data are aggregated across participants, so
that every item has one value per participant group. Importantly, the F1 and F2
analyses take into account the variance associated with only one random effect.
The F1 analysis accounts for participant-related variance, while the F2 analysis
accounts for item-related variance.This allows significant results to be generalized
from your participants to the larger population (F1 analysis) or from your subset
of items to the target linguistic structure (F2 analysis).
The trouble is what happens if results are significant in one analysis but not in
the other. Such results lead to inherent interpretation difficulties, because only one
FIGURE 8.16 Three ways of laying out the same data set: for a mixed-effects regression
analysis, a F1 by-subject ANOVA, and a F2 by-item ANOVA.The arrows
represent the units over which the data are averaged and demonstrate
the amount of information (variance) that is lost by averaging.
source of variance is accounted for in each set of analyses, making it impossible to

say which result is “correct”. The fact is that neither a by-subject nor a by-item
analysis alone produces accurate, robust results. This is because, as will be discussed
later, researchers need to take all sources of variance into account when they build
statistical models for their data in order to maximize generalizability. Given the early
predominance of by-subject analyses, Clark (1973) noted that some research con-
clusions suffered from a language-as-a-fixed-effect fallacy—the phenomenon
under investigation might not be generalizable from the subset of items to the target
linguistic structure in question. Although Clark’s paper is almost 50 years old, the
language-as-a-fixed-effect fallacy remains an issue in large areas of L2 and bilingual-
ism research, which rely primarily on by-subject analyses for statistical evidence.
8.4.2 Introducing Linear Mixed-Effects Models

In the last ten years, linear mixed-effects models (LMMs) have been intro-
duced as a new and powerful statistical technique that will let researchers take
all relevant sources of variance into account (Baayen, 2008; Baayen, Davidson,
& Bates, 2008). Although LMMs are still a relatively new technique, they are
rapidly becoming the gold standard in L2 and bilingual eye-tracking research (see
Figure 8.13) and may increasingly replace ANOVAs in the next decade, follow-
ing similar developments in psychology (e.g., Barr, Levy, Scheepers, & Tily, 2013;
Matuschek, Kliegl,Vasishth, Baayen, & Bates, 2017).
LMMs offer the same flexibility as multivariate regression (Jeon, 2015; Plonsky
& Ghanbar, 2018; Plonsky & Oswald, 2017), but with the possibility of filtering
out additional sources of variance. In LMMs, each observation is treated as a sepa-
rate data point, represented on a separate line in a spreadsheet (see Figure 8.16, full
dataset). This enables the model to take full advantage of the information in the
data. To ensure robust and generalizable findings, the participant and item base in
a study will need to be sufficiently large for patterns to emerge as significant fixed
effects (Bell et al., 2010; Snijders & Bosker, 2012). Lastly, LMMs have drastically
changed the way “outliers” in one’s data set are addressed (for more information,
see Section 8.2.3). Therefore, opting for a LMM approach will influence every-
thing from study planning, to how the data set is organized, and whether and how
data are trimmed.
LMMs are an extension of LMs such as linear regression and ANOVA. They
derive their name from the fact that they can model a mix of fixed and ran-
dom variables, similarly to repeated-measures ANOVA.3 The fixed effects are
what is normally reported as the results of a statistical analysis in a research paper.
Random effects are variables in a study (e.g., participants, items) that exert an
influence on the outcome, without typically being the researcher’s primary focus
(for more information on fixed and random effects, see Textbox 8.2).
TEXTBOX 8.2. FIXED AND RANDOM EFFECTS

Fixed effects are the independent variables in a study that the researcher
manipulates, observes, or controls for. These variables are considered fixed,
because it assumed that they will remain constant from one experiment to
the next (Barr et al., 2013).
Examples: Treatment and condition are typical fixed effects in eye-tracking
research.
Random effects are the independent variables that result from random
sampling from a population. They are likely to change from one experiment
to another because, for example, researchers might recruit different partici-
pants for a new experiment.
Examples: participants is a random effect because participants are randomly
sampled from the population every time you run an experiment. Likewise, the
items in a study represent a random effect because they are a random subset
of all possible expressions of the targeted linguistic phenomenon.
Unlike repeated-measures ANOVA, LMMs can model multiple random effects

for multiple independent variables in a single analysis. Thus, a first key advantage
of LMMs is that they can simultaneously model multiple fixed- and random-
effects variables. This feature addresses the discussed limitations of traditional
by-subject and by-item analyses in repeated-measures ANOVA (Baayen, 2008;
Baayen et al. 2008). Another, related advantage of LMMs is their ability to handle
non-independence between observations. In most eye-tracking studies, obser-
vations will not be independent. Individual participants will contribute multiple
observations to a data set; similarly, items will normally be seen by more than one
study participant.Through the random variance components, LMMs will estimate
the dependence between observations. This will make data aggregation unneces-
sary and result in more accurate estimates of the standard errors of the fixed effects
(Gelman & Hill, 2007); that is, more precise estimates of the effects that do not
differ due to participant or item idiosyncrasies. In the next paragraph, we will take
a look at the different forms that random variance components can take.
When reported in a journal article, a LMM looks similar to a linear regression,
but with the addition of a random effects structure (for an example, see Section
8.4.4).With regard to the fixed effects, researchers will still report the (fixed) inter-
cept, the main effects, and any interaction terms for their model. A nice feature of
regression is that it can handle both continuous and categorical independent vari-
ables, so there is no need to render a continuous variable categorical (e.g., by cre-
ating subgroups), which results in a loss of precious variance when researchers do.
The b regression coefficients indicate the change in the outcome variable when
the predictor variable increases by one unit. All of this is not new, and has also
been described in other L2-specific publications (Jeon, 2015; Plonsky & Ghanbar,
2018; Plonsky & Oswald, 2017). The new part of LMMs concerns the estimation
of random effects alongside the previously mentioned fixed effects.
8.4.3 Data-driven Versus Top-Down Approaches

to Selecting a Random Effects Structure
Since the publication of Baayen’s influential book (2008), which introduced LMMs
to the field of psychology, researchers have debated what the composition of the
random effects structure should be (see Textbox 8.3). Current debate in psychol-
ogy centers on the tension between model fit (how well the model can explain
the data) and model complexity (how many variables there are in the model),
and how these two criteria ought to be weighted. Adding more random effects to
a model will improve model fit, but it may come at a cost of reduced power (fixed
effects may no longer be significant). On the other hand, a failure to include a
random effect that exerts a large influence on the outcome can cause false-positive
results in the fixed effects (Barr et al., 2013). Researchers, then, need to strike a
balance between model fit and model complexity. In Matschek et al.’s words, “the
best model is the one providing maximal power while maintaining Type I error
rate at the nominal level (e.g., α = 0.05)” (Matuschek et al., 2017, p. 308).
TEXTBOX 8.3. CHOOSING A RANDOM EFFECTS

STRUCTURE
Each random effect can be represented by up to three random effects
components:
1. A random intercept (e.g., random intercept by participants [shown

here], random intercept by items):
A random intercept captures the variance among participants or items

in the reference (baseline) condition. It is assumed there will be indi-
vidual differences among participants or items at the intercept.
2. A random slope (e.g., random slope for condition by participants
[shown here], random slope for condition by items):
A random slope captures the variance in how different participants or

items respond or behave over the different experimental conditions. It
is assumed that condition (or whichever fixed effect is of interest) will
not affect all participants or items in the same way.
Adding random slopes is only meaningful in within-subject and/or
within-item experimental designs. For example, to model individual
differences in how participants respond to different treatments, each
individual must experience all the different treatments.
3. A random intercept-slope correlation:
A random intercept-slope correlation captures the relationship between

participants’ or items’ random intercept and their random slope. For
instance, a negative correlation indicates that participants/items with
a higher intercept tend to show a stronger decrease over the different
experimental conditions.
Barr et al. (2013) argued that researchers engaged in confirmatory hypothesis

testing should adopt maximal random effects structures for their analyses. A
maximal random effects structure contains all possible random effects components
(i.e., intercept, slope, intercept-slope correlation) that are justified by the research
design. For every fixed effect (including interaction terms) in the model, research-
ers should attempt to enter all the corresponding random effects (see Textbox
8.3). Using computer simulations, the authors showed that random intercept-only
models are liable to produce spurious significance (i.e., high Type I error rates).
Because maximal models attempt to explain as many sources of random variance
as possible, these models maximize the generalizability of findings, while keeping
false positives in check. This is a desirable feature for confirmatory, theory-driven

research (also see Linck & Cunnings, 2015). However, maximally specified ran-
dom effects structures can quickly get very complex if models include more than
a few independent variables, and this may lead to a failure to converge, whereby
the algorithm cannot estimate all the model parameters. Results from a model
that did not converge should never be reported, because they cannot be trusted.
Matuschek et al. (2017) revisited the question of whether it is worth modeling
all possible sources of random variance, especially when some of the random
effects are small and the data set is not that large. Also using computer simula-
tions, these authors proposed that a backward model selection approach may
yield the best results. In this approach, researchers start off with the maximal ran-
dom effects structure (cf. Barr et al., 2013) and then progressively trim it down,
first, by attempting to remove random intercept-slope correlations and then, by
attempting to remove random slopes. At each step, a model selection criterion
(e.g., the likelihood ratio test or the Akaike Information criterion [AIC]) is used
to determine whether the simpler model fits the data as well as the more complex
model (see the following, for an example). It is worth emphasizing that backward
model selection, as promoted by Matschek and colleagues, does not offer a one-
size-fits-all solution. Some models will have a maximal structure, others will have
by-subject and by-item random intercepts only, and others still will end up with
something in between. Psychologists nowadays, though, seem to agree that having
just a by-subject random intercept is never a good idea (because of high Type I
error rates), even if these models are still quite common in psychology and in L2
research. If random effects are added, the model should minimally have a by-item
random intercept as well. The point is to engage in model selection to obtain a
parsimonious, yet well-fitting model that is true to the data.
8.4.4 Worked Example
The aim of this section is to illustrate what LMMs look like when applied to L2
eye-tracking data. Specifically, I will adopt a backward model selection approach,
following current best practices in model selection (Matuschek et al., 2017), and
apply it to an L2 learning experiment (see Figure 8.17). The data for this demon-
stration are from Godfroid and Uggen (2013).4 The present example builds on and,
in some ways, precedes the discussion on outlier identification in Section 8.2.3.
Indeed, the aim of the current analyses is to find the best-fitting statistical model.
The residuals (error terms) of this model can then be subject to model criticism
to identify extreme observations as a part of outlier treatment (see Section 8.2.3).
Godfroid and Uggen (2013) investigated to what extent beginning-level learn-
ers of German notice, or pay extra attention to, unfamiliar, irregular German verbs
during reading (for more information, see Section 3.2.1).The participants read 24
sets of critical sentences that contained either a regular verb, an irregular verb with
an e → i(e) vowel change, or an irregular verb with an a → ä vowel change. Thus,
FIGURE 8.17 Fitting a LMM for a fixed set of independent variables (fixed effects).
there was one (fixed) independent variable, Condition. It had three levels: regular
verbs, irregular e → i(e) verbs, and irregular a → ä verbs. Following Matuschek
et al.’s (2017) guidelines, I started off with a maximally specified random effects
structure. I used the lmer () function in the lme4 package (Version 1.1–17) in R
(Version 3.4.2) with restricted maximum likelihood (REML) as the estimation
method. Degrees of freedom for the t test were calculated using Satterthwaite’s
method. Here is the formula used in the R code:
Delta Total Time ~ Condition + (1| Verb) + (1 + Condition | Subject )
In this case, the maximal structure included a by-item random intercept (1|Verb), a
by-subject random intercept (1|Subject), a random slope for Condition by subject
(Condition|Subject), and the random intercept-slope correlation for Condition

by subject (1+Condition|Subject). Adding a random slope for Condition by item
(and the corresponding intercept-slope correlation) would not have been mean-
ingful because each item (i.e., each verb) is associated with one Condition only;
that is, German verbs are either regular or irregular.
I also ran a more parsimonious model, in which I removed the random slope
for Condition by subject and the random intercept-slope correlation.When trim-
ming down models, it is common to remove one random variance component at
a time, for instance first the correlation and then the slope.When a model includes
only categorical predictors, however, as in the present example, the rule does
not apply and correlations and slopes are removed together (Drieghe, personal
communication, September 27, 2018). This resulted in the following competitor
model with random intercepts only:
Delta Total Time ~ Condition + (1| Verb) + (1| Subject )
You may wonder about trimming down the model even further, for instance to
a by-subject intercept-only model or a by-item intercept-only model. Default
practice in psychology nowadays is to leave both intercepts in (Barr et al., 2013),
on the assumption that items, like participants, will differ among themselves in
the response behavior that they elicit. In line with this practice, Matuschek and
colleagues (2017) did not trim their models down beyond by-subject and by-item
intercept models and in the current example, I will follow the same approach.
Therefore, Model 2, which has a by-item and a by-subject random intercept, is
the most parsimonious candidate for model selection.
Table 8.2 presents the goodness-of-fit statistics for the two models. Both models
also contained a fixed effect for Condition; however, we are not looking at the
results for Condition yet, because we need to find the best-fitting model first. To
select the best-fitting model, I relied mostly on the likelihood ratio test5 (LRT),
combined with the AIC criterion, an index of model fit adjusted for model com-
plexity. Smaller AIC values indicate a better model fit (for model-fitting approaches
in exploratory research, see Gries, 2015). The LRT is a comparison of two mod-
els’ deviances (see Field, 2018, for details). Following Matuschek et al. (2017), I
adopted a significance level α of .20 for the LRT. Using this criterion, p < .20 indi-
cates that the simpler model fits the data significantly less well than the more com-
plex model—that is, the more complex model wins. If p ≥ .20, the simpler model
is preferred because it does not differ statistically from the more complex model in
how well it fits the data. In the current example, both model indices point toward
the simpler model being preferred: it has the smaller AIC value (Model 1: 15021,
Model 2: 15019) and the LRT is non-significant (χ2 = 6.64, df = 5, p = .25).
Once the best-fitting model has been identified, the researcher will focus her
reporting efforts solely on that model. When reporting the results, it is recom-
mended to include the random effects structure, because this serves as a check
TABLE 8.2 Backward model selection: The maximal model and a more parsimonious competitor model
Outcome Fixed effects Random effects p AIC R2

variable
delta total Condition By-subject By-subject By-subject By-item By-item random By-item random R2m R2c
time (X1) random random slope random random slope for intercept-slope
intercept for Condition intercept-slope intercept Condition correlation for
(X1) correlation for (X1) Condition
Condition (X1)
(X1)
Model 1 x x x x x x .25 15021 0.011 0.12

Model 2 x x x x 15019 0.011 0.10
Note: in linear regression, R2 represents the amount of variance in the outcome variable that is explained by the predictors. In linear mixed-effects models, the marginal
R2 (R2m) denotes the amount of variance explained by fixed effects. The conditional R2 (R2c) indicates the amount of variance explained by both fixed and random effects.
(Source: Based on Godfroid and Uggen, 2013).
on model fit6 and can contain useful information about individual differences in
participants and items (Cunnings, 2012; Linck & Cunnings, 2015; Matuschek et
al., 2017). Table 8.3 presents the full results for Model 2. We can see that the by-
subject random intercept accounted for about three times as much variance as the
by-item intercept. This is a common finding, in that participants usually represent
the largest source of random variance in a study.
Now, at last, the time has come to take a first look at the results for the fixed
effect of Condition. Recall that these are the coefficients that will provide the
answer to the research question (whether beginning-level learners noticed the
unfamiliar, irregular verbs). Coefficient b1, for a → ä verbs, and coefficient b2, for
e → i(e) verbs, both signal the change in reading time relative to the regular verbs
(intercept coefficient b0). The noticing hypothesis (Schmidt, 1990) would predict
an increase in reading time (and, hence, positive regression coefficients) for unfa-
miliar verb types that are in the process of being acquired (Godfroid, Boers, &
Housen, 2013). As seen in Table 8.3, participants paid significantly more attention
to a → ä irregular verbs, but not e → i(e) irregular verbs.We also found, in a differ-
ent analysis, that amount of attention to the verb forms during reading correlated
with gains on a verb production post-test, confirming the beneficial effects of
increased visual attention during reading for L2 learning.
Finally, the results of this analysis need to be confirmed by means of a sensitiv-
ity analysis, designed to test the possible influence of outliers (see Section 8.2.3). I
subjected Model 2 to model criticism, whereby I saved and plotted the standard-
ized residuals, as shown in Figure 8.10. I removed the observations (k = 23) with
large residuals from the analysis (see Table 8.1) and then reran the model.Table 8.4
presents the final results, as would be reported for publication. The previous con-
clusion, that learners noticed changes in a → ä verbs, is maintained. The pattern
in the e → i(e) verbs is increasingly consistent with this finding, although it did not
quite reach significance. Thus, I found evidence for noticing, though not across
both verb types, probably because there was not enough power after the inclusion
TABLE 8.3 The best-fitting linear mixed-effects model
Delta Total Time ~ Condition +(1 | Verb) +(1 | Subject)
Fixed effects B SE t p
Intercept (regular) −102.79 62.63 −1.64 .11
a → ä irregular 210.11 90.33 2.33 .030
e → i irregular 115.34 90.11 1.28 .21
Random effects Variance SD
(1|subject) 48930 221
(1|verb) 16026 127
Residual 634768 797
R2 marginal/R2 conditional 0.011/0.10
AIC 15019
TABLE 8.4 Final model after model criticism. Observations with large residuals (2.5 < |z|)
were removed from the analysis
Delta Total Time ~ Condition + (1 | verb) + (1 | subject)
Fixed effects B SE t p
Intercept (regular) −90.81 64.08 −1.42 .16
a → ä irregular 176.34 80.75 2.18 .04
e → i irregular 135.90 80.60 1.69 .11
(1|subject) 78550 280
(1|verb) 12471 112
Residual 503912 710
R2 marginal / R2 conditional 0.010 / 0.16
AIC 14425
of both subject and item random effects (compare with Godfroid and Uggen,
2013). If any researchers would like to replicate this study, they should consider
increasing the number of items in the two irregular verb conditions. For now, the
results demonstrate how increased attention and noticing (Schmidt, 1990) can
manifest itself in eye-movement records and, more generally, illustrate the poten-
tial of eye-tracking methodology to study L2 learning processes (Godfroid, 2019).
8.4.5 Reporting the Results

The following text is a possible model for how to report LMM results, using the
preceding statistical analysis as an example:
To analyze the data, linear mixed-effects models were fit using the lmer
() function in the lme4 package (Version 1.1–17) in R (Version 3.4.2).
Restricted maximum likelihood (REML) was used as the estimation
method for model fitting. The outcome variable was Delta Total Reading
Time, which was a baseline-corrected measure of total reading time on the
critical verb form. Verb type was the main independent variable of inter-
est. It was represented as a three-level categorical variable, Condition, with
the regular verbs specified as the reference category using treatment cod-
ing. Degrees of freedom for the t test were calculated using Satterthwaite’s
method. The random-effects structure was determined by model compari-
sons, using a backward model selection approach. I began with the maxi-
mal random effects structure and progressively trimmed any random effects
that did not contribute significantly to model fit (Matuschek et al., 2017).
Changes in model fit were assessed through log-likelihood ratio tests (LRT)
with the α level set at .20 (Matuschek et al., 2017) and through a com-
parison of absolute AIC values. The final random-effects structure included
random intercepts by participant and by item.
Results for the best-fitting model are presented in Table 8.4. Detailed
model comparisons are reported in the Appendix [i.e., Table 8.2]. Results
showed that Condition exerted a significant influence on reading times for
irregular verbs with an a → ä vowel change (b = 176.34, SE = 80.75, t = 2.18,
p = .04), indicating that participants spent an average 176 ms extra on verbs
with this vowel change than on matched, regular verbs. For verbs with
an e → i(e) change, Condition was not significant (b = 135.90, SE = 80.60,
t = 1.69, p = .11), suggesting L2 learners’ increase in reading times on this
particular verb type was not statistically reliable.
8.5 Analyzing Time-Course Data

Time-course analysis is most common in visual world research. It differs from
other statistical analyses (e.g., in reading research) in that the analysis includes
Time as a potential independent variable. Recall that visual world studies are
characterized by the joint use of spoken and written language (for a review, see
Chapter 4). The spoken language is inherently temporal because speech naturally
unfolds over time. A typical question in visual world research, therefore, is how fast
participants look at a target image on the screen after they hear a particular cue
in the spoken input (see Sections 4.2.2 and 6.3.2.2). From an analysis perspective,
this question requires capturing the dynamic aspects of processing, so researchers
can determine statistically when the spoken language begins to influence process-
ing. As seen in Figure 8.18, there are different approaches for doing so. Here, I
discuss how to analyze separate time windows (with the analysis of one large time
window as its simplest case) and growth curve analysis.
FIGURE 8.18 Treatment of Time in different statistical procedures.

8.5.1 Analyzing Separate Time Windows

One way to handle time course data is to divide the entire time window up into
larger segments of, say, 200 or 250 ms each and average all the fixation data for a
given time window. Each time segment can then be analyzed using familiar sta-
tistical techniques such as ANOVA or LMM. For instance, in a study on pronoun
resolution, Cunnings, Fotiadou, and Tsimpli (2017) divided the 1200 ms follow-
ing the onset of the subject pronoun into six 200-ms time bins. They repeated
the same analysis (a LMM) for each of the bins. By comparing the results for the
different time bins, the authors concluded that L1 Greek–L2 English participants,
like L1 English participants, can use gender information to identify the correct
pronoun antecedent in the sentence (also see Section 4.2.3). Dividing the speech
stream into larger segments, as demonstrated in this study, will let the research-
ers analyze time-course data using familiar statistical techniques, such as t tests,
ANOVA, or LMM. In fact, it is a relatively common solution in L2 visual world
research (for other examples, see Dijkgraaf, Hartsuiker, & Duyck, 2017; Flecken,
Carroll, Weimar, & Von Stutterheim, 2015; Ito, Corley, & Pickering, 2018; Kim,
Montrul, & Yoon, 2015; Kohlstedt & Mani, 2018). The downside, however, is that
separate bin-by-bin analyses treat Time—an inherently continuous variable—as
categorical. This will create “the illusion that continuous processes are discrete”
(Mirman, 2014, p. 1). Additional issues are an overreliance on p values to make
dichotomous, yes-or-no judgments and difficulties in determining appropriate
bin sizes (see Mirman, 2014, Chapter 1). To capture the temporal dynamics of
speech processing more fully, the researchers need to include Time as an inde-
pendent variable in their analyses and conduct what is known as a growth curve
analysis.
8.5.2 Growth Curve Analysis

8.5.2.1 Data Preprocessing
A growth curve analysis is a mixed-effects regression model that describes the
overall response curve that emerges from participants’ eye fixations over time
(Barr, 2008; Mirman, Dixon, & Magnuson, 2008).Thus, many of the points raised
for LMMs (e.g., how to select a random-effects structure) also apply to growth
curve analysis (see Section 8.4), but, different from LMMs, the outcome variable
in growth curve analysis (as applied to eye-tracking data) is binary.
At their core, visual-world eye-tracking data are binary data. They are a long
series of 0s and 1s, each representing whether (1) or not (0) an eye fixation landed
in a given region on the screen at a particular time point. This binary informa-
tion is gleaned from the raw eye-tracking data, which represent eye gaze posi-
tion in screen pixel coordinates (see Table 8.5). Data preprocessing, then, means
converting the x, y coordinates (as recorded by the eye tracker) into 1s and 0s.
Furthermore, during data preprocessing multiple data samples from the eye tracker
TABLE 8.5 Raw eye-tracking data for three trials (trial excerpts) from one participant.
Columns 6 and 7 indicate gaze position in x, y screen coordinates
SubjectID TrialID TimeStamp Item Condition GazePositionX GazePositionY

107 1 105935 pour Content 0.492875 0.343951
107 1 105938.2 pour Content 0.459653 0.342029
107 1 105941.6 pour Content 0.490078 0.337698
...
107 1 105974.9 pour Content 0.605775 0.737006
107 1 105978.2 pour Content 0.608426 0.735055
107 1 105981.5 pour Content 0.608053 0.740365
...
107 2 110011.5 spill Content 0.5131428 0.347103
107 2 110014.9 spill Content 0.5025352 0.355047
107 2 110018.2 spill Content 0.5090657 0.320309
...
107 2 110051.5 spill Content 0.238827 0.7535595
107 2 110054.8 spill Content 0.2471954 0.7482244
107 2 110058.2 spill Content 0.2366214 0.7397441
...
107 6 115091.5 fill Container 0.6409222 0.7320131
107 6 115094.8 fill Container 0.6468923 0.7682004
107 6 115098.2 fill Container 0.6469185 0.7597103
...
107 6 115131.5 fill Container 0.6400046 0.7334433
107 6 115134.8 fill Container 0.6470405 0.7482242
107 6 115138.2 fill Container 0.6348939 0.7437489
(Source: Chepyshko, 2018).
FIGURE 8.19 Calculating fixation proportions based on raw eye-tracking data.

are collapsed into larger time bins (see Figure 8.19), enabling researchers to cal-
culate aggregate measures. The goal of data preprocessing, then, is to condense
information both spatially and temporally so the researcher can detect changes in
eye gaze patterns over time.
Figure 8.19 is a simplified representation of the binning process. In DataViewer,
the data analysis software from SR Research (see Section 8.1.1), the time course
binning report will do most of the data preprocessing for you. As a researcher,
you only need to select a bin size (e.g., 20 ms, 40 ms, or 50 ms) and one or
more eye-tracking measures (e.g., fixation count) for which you would like data
aggregation to occur. In Tobii Pro Studio, Tobii’s comprehensive software pro-
gram (see Section 8.1.1), researchers will typically export the raw eye gaze data
(timestamp data and gaze tracking data) first and then do the data preprocessing
using another program such as R. The eyetrackingR package (Dink & Ferguson,
2015) provides step-by-step instructions for converting raw eye-tracking data
into a format suitable for data analysis. Here, I illustrate the three major steps in
data preprocessing, using the raw eye-tracking data from a visual world study by
Chepyshko (2018).
Table 8.5 contains the eye-tracking data in a raw, unprocessed form. Because
Chepyshko recorded data with a 300 Hz eye tracker, every row represents one
snapshot of the eye taken at a 3.33 ms interval (for details on sampling speed, see
Section 9.1.3). The time stamp indicates when, on the eye tracker’s internal
clock, the measurement was made. These, then, are the raw eye-tracking data.
They will still require some data wrangling (either manual or automatic) before
they can be graphed or used for statistical analysis.
First, some small amount of binning will be necessary (compare with Section
8.5.1) to downsample the large amount of raw (unprocessed) eye-tracking data
into more manageable units. Binning is useful because
the [eye tracker’s] sampling frequency might be much faster than behavioral
changes (for example, an eye-tracker might record eye position every 2 ms,
but planning and executing an eye movement typically takes about 200 ms),
which can produce many identical observations and lead to false positive
results.
(Mirman, 2014, pp. 18–19)
Table 8.6 shows the outcome of the binning process.The researcher collapsed the
data into 50 ms bins, with 15 data samples making up one time bin (Chepyshko,
2018). The columns Look at Target and Look at Distractor represent the total
number of fixations in the target area and distractor area, respectively, as tallied by
the software. Different from the separate bin-by-bin analyses (see Section 8.5.1),
researchers interested in doing a growth curve analysis will retain the tempo-
ral information for their time-course data, even after binning. To do so, the bin
TABLE 8.6 Binned eye-tracking data for three trials (trial excerpts) from one participant.
Every time bin represents the aggregate data from 15 raw data samples
SubjectID TrialID TimeBin Item Condition Look at Look at

Target Distractor
107 1 1 pour Content 0 5
…
107 2 3 spill Content 10 0
…
107 6 8 fill Container 15 0
Note: Looks at Target and Looks at Distractor do not add up to 15 when the participant also looked
elsewhere on the screen in the time bin.
number (Time Bin column) will be entered as a continuous independent variable

in the statistical analysis.
Once the data have been binned, researchers can distill aggregate measures
they will use for data visualization and analysis purposes (see Table 8.7). Fixation
proportion is the total number of looks divided by the total number of data
samples (i.e., looks and non-looks combined) within a time bin: fixation propor-
tion = SumFix
N (see Table 8.7). Fixation proportion is a number between 0 and 1,
akin to a percentage, that can be used for graphing (see Section 7.2.1.1); however,
it is better not to use fixation proportion as a dependent variable for analysis
because doing so will violate a number of statistical assumptions (see Section
8.5.2.3). Instead, researchers calculate the odds of fixation as a second, aggregate
measure. Proportion and odds are related, but distinct ways of describing the like-
lihood of an event. In visual world research, the odds represent the total number
of looks divided by the total number of non-looks at an image: odds = NSumFix -SumFix
in Table 8.7. For instance, a .50 fixation proportion corresponds to an odds of
.50 = 1
.50 , meaning there is an equal chance of looks and non-looks at the image.
Odds larger than 1 indicate a greater likelihood of looks (e.g., 1.5 odds = 66.67%
probability) and odds smaller than 1 indicate a greater likelihood of non-looks
(e.g., 0.5 odds = 33.33% probability). Odds can be used for data visualization but,
above all, they are the basic building block of logistic and quasi-logistic regression,
to which we will turn in Section 8.5.2.3.
8.5.2.2 Data Visualization
An appealing feature of visual world research is the detailed visual representations
of how participants’ eye fixation behavior unfolds over time. These graphs are
called growth curves; they are data-rich visuals that conform to best practices in
TABLE 8.7 Binned eye-tracking data with dependent variables used for plotting and analysis: fixation proportion (FixProp)
and empirical logit (elog)
SubjectID TrialID TimeBin Item Condition Interest Area SumFix N FixProp elog wts
107 1 1 pour Content Target 0 15 0.00 −3.43 2.06
107 1 1 pour Content Distractor 5 15 0.33 −0.65 0.28
...
107 2 3 spill Content Target 10 15 0.67 0.65 0.28
107 2 3 spill Content Distractor 0 15 0.00 −3.43 2.06
...
107 6 8 fill Container Target 15 15 1.00 3.43 2.06
107 6 8 fill Container Distractor 0 15 0.00 −3.43 2.06
Note: wts = weights, used in combination with elog in quasi-logistic regression.

data visualization (Larson-Hall, 2017). I really like them. Growth curves lie at the
basis of growth curve analysis. Specifically, the aim of a growth curve analysis
is to describe the shape of one or more growth curves.
Growth curves can be generated for the aggregate measures described in
Section 8.5.2.1—proportions and odds—and for measures derived from odds—
log odds (logit) and empirical logit (elog) (more information following in
Section 8.5.2.3). Growth curves depict participant looks at different images on the
screen, averaged across all participants and all items. As a researcher, you need to
decide for which images or entities on the screen you want to plot the data and for
what time period. These are non-trivial matters, as choosing what to plot and for
what period will have a large influence on your subsequent data analysis.
Consider the following example, from Chepyshko (2018). Chepyshko wanted
to test whether L1 and L2 English speakers of differing proficiency levels can
use verb semantics to predict the upcoming argument of a locative verb. His
displays contained an agent (e.g., cook), a content (e.g., coffee) and a container
(e.g., t-shirt): see Figure 8.20. Of interest was whether listeners would anticipate
different objects following a content verb such as spill (i.e., the content object cof-
fee) and a container verb such as stain (i.e., the container object t-shirt). To answer
this research question, the author could draw a number of potential comparisons,
all of which would provide related but slightly different information (also see
Sections 6.1.3.2 and 6.3.1.1):
(i) compare looks at Content versus looks at Container é looks (Content) ù ,

ë looks (Container) û
(ii) compare looks at Target versus looks at Distractor é looks (Target) ù ,
ë looks (Distractor) û
(iii) compare looks at Target versus looks at everything else é looks (Target) ù
.
ë looks (Other) û
(
Each comparison has a different odds formula recall that odds = non-looks looks , )
because the numerator and the denominator are not the same. The odds will
be your dependent variable. Each of your predictors will indicate how the odds
change as a result of your independent variables.Therefore, you want to define the
odds (i.e., pick your numerator and denominator) in a way that makes most sense
for your design and research questions. For instance, in comparison (i), looks at
Content will be the focus regardless of whether participants hear a content verb
or a container verb. In that case, the researcher expects high odds in the Content
Verb condition (where the content image is the target) but lower odds in the
Container Verb condition (where the container image is the target). In compari-
sons (ii) and (iii), the image of interest will change along with the verb, so the
analyst is always modeling looks at Target.Therefore, the odds are always expected
FIGURE 8.20
Sample display from a verb argument prediction study. Each display
consisted of an agent, a content, and a container, and was paired with an
audio description including either a content verb (“The cook spilled the coffee
on the t-shirt.”) or a container verb (“The cook stained the t-shirt with coffee.”).
Note: boxes represent interest areas and were not seen by the participants.
(Source: Reproduced from Chepyshko, 2018).
to be larger than 1 (favoring the Target). In this case, the independent variable Verb
Type indicates if looks to the Target differ between the two verb types.
For the denominator, Barr (2008) recommended using all non-target looks
as the reference, in line with option (iii) outlined previously. In this approach,
looks directed at empty portions of the screen will be included in the analysis
as well (also see Dussias, Valdés Kroff, Guzzardo Tamargo, & Gerfen, 2013). By
including outside looks in the analysis, researchers can rule out potential con-
founding effects of condition on looking behavior. For example, participants
may look at empty space more in cognitively or attentionally demanding condi-
tions due to increased cognitive load. This information will be lost, and may be
confounding results, if only looks at target and distractor images are considered.
Therefore, whichever region is of interest is best compared against everything else
on the screen, for instance looks (Target)
looks (Other)
or looks (Competitor) .
looks (Other)
When graphing, however, researchers will often omit these outside looks from
their data visualizations (i.e., they will not plot looks at Other). This could be
misleading because what is shown in the graph may not be exactly what was ana-
lyzed statistically, putting the onus on readers to figure out exactly what the odds
formula looked like. To align data visualization and statistical analysis, research-
ers could consider plotting these outside looks as an additional line in their
graphs, especially when they plan to include these data later, as Barr (2008) and
I recommend. In all cases, the researcher should clearly state which regions on
the screen were included for analysis so readers are able to retrace this step and
interpret the statistical results correctly.
With these considerations in mind, Figure 8.21 shows two sets of growth
curves for Chepyshko’s (2018) data. The data are for the 0–850 ms time win-
dow, which corresponded to the verb segment (The cook spilled/stained … ).7 The
graphs show looks at the Target and the Distractor, which are of primary inter-
est. Proportions do not add up to 1 because participants were also looking else-
where on the screen. Because this was an early prediction window, and the verb
followed the agent, fixation proportions start off low (many outside looks), but
they increase over time. Importantly, we can detect a widening gap between the
Target and Distractor fixations in the Content Verbs (top panel) 600–850 ms post
verb onset. If confirmed statistically, these diverging lines could indicate anticipa-
tory processing of the verb argument in the Content Verb condition. Specifically,
native English speakers may be able to anticipate, and look at, the argument of
content verbs (e.g., spill the coffee) but not container verbs (e.g., stain the t-shirt) as
soon as they hear the verb form.
Visual inspection of the data, as exemplified here, is an important first step
in the analysis. By looking at graphical data representations carefully, you can
spot trends that suggest anticipation or competition. These patterns can later be
confirmed or disconfirmed statistically, using a growth curve analysis (see Section
8.5.2.5).You could inspect proportion data, as shown here, or use the log odds or
empirical logit data you plan to use for analysis.8 What matters most is that you
look at your data first because this can really help you understand what is going
on in your study. Likewise, when reading other people’s work, it is a good habit to
lay the results of the statistical analysis and the graphs side by side and check if you
can see the statistical results mirrored in the visual data representations.
FIGURE 8.21 Time course graph for the L1 English group. Lines show the proportion
of fixations to the Target (circles) and Distractor (triangles) during
auditory presentation of Content (top panel) and Container (bottom
panel) locative verbs.
8.5.2.3 Logistic or Quasi-Logistic Regression

The previous section highlighted the importance of data visualization in the anal-
ysis of visual-world eye-tracking data. Fixation proportion is often used for
graphing purposes (see Figure 8.21), perhaps because most people think about
(
event likelihoods in this way, in terms of proportions proportion = looks +looks )
non-looks
.
It is also common—though not recommended—to analyze proportion data sta-
tistically, by performing a series of ANOVAs on separate time bins (see Section
8.5.1). This practice, however, can lead to spurious (invalid) results, as proportion
data violate both conceptual and statistical assumptions of ANOVA (Jaeger, 2008).
First, ANOVAs were developed to deal with unbounded continuous variables.
Proportions, on the other hand, are bounded between 0 and 1: they cannot be
smaller than 0 or larger than 1. What happens when performing an ANOVA on
proportion data is that confidence intervals may extend beyond 0 or 1, which
makes them difficult to interpret (Jaeger, 2008). In statistical terms, oversized con-
fidence intervals cause ANOVAs to “attribute probability mass [e.g., probability
that Y is the true population mean] to events that can never occur, thereby likely
underestimating the probability mass over events that actually can occur” (Jaeger,
2008, p. 435, original emphasis, my addition). In short, incorrect confidence inter-
vals may lead to spurious results (Jaeger, 2008).
The second concern is that binary data follow a binomial distribution (a prob-
ability density distribution that is different from the normal distribution).Variance
in the binomial distribution is not equal: there is a larger variance in the middle
FIGURE 8.22 Proportion, odds, and log odds.

of the distribution (p = .50) than toward the endpoints. This violates ANOVA’s
assumption of equality of variance and is a second reason ANOVA should not be
used for analyzing proportion data (Jaeger, 2008). In sum, fixation proportions
may be used for data visualization purposes, but they should not be treated as a
dependent variable for analysis.
( looks
Odds non-looks ) are another measure of likelihood (see Sections 8.5.2.1 and
8.5.2.2) that is more advantageous for statistical analysis. Odds lend themselves to
known analytical solutions, such as the LM (i.e., regression), because odds can be
transformed into unbounded data. To do so, a two-step link function is applied.
First, the software calculates the odds based on the binary fixation data. By defini-
tion, odds range from 0 to ∞. They have no upper bound. Second, the software
takes the natural logarithm of the odds, called the log odds or logit. This will
remove the lower bound: log-transformed data range from -∞ to ∞. Together,
these two steps turn your original binary variable into a potential outcome vari-
able for a linear regression! Therefore, conceptually, logistic regression is nothing
more than linear regression in log-odds space.
Knowing a few basic facts about the relationship between log odds, odds, and
proportions can help you interpret the regression coefficients in logistic regres-
sion (see Section 8.5.2.5 for a worked example). A .50 proportion corresponds to
( )
an odds of ..50 ( )
.50
50 = 1 and a log odds of ln .50 = ln(1) = 0. For higher proportions,
the odds are larger than 1 and the log odds are larger than 0 (i.e., positive), whereas
for lower proportions the odds are smaller than 1 and the log odds are smaller
than 0 (i.e., negative): see Figure 8.22 and Textbox 8.4.9 It is thus possible to move
between the log odds, the odds, and the proportion scale, which comes in handy
when reporting the results.
TEXTBOX 8.4. PROPORTIONS, ODDS, AND LOG ODDS

Equal likelihood, even odds
p(Y) = .50 ⇔ odds(Y) = 1 ⇔ log odds(Y) = 0
More likely, odds in favor (increase)
p(Y) > .50 ⇔ odds(Y) > 1 ⇔ log odds(Y) > 0
Less likely, odds against (decrease)
p(Y) < .50 ⇔ odds(Y) < 1 ⇔ log odds(Y) < 0
The logit link function preserves the original directionality of the effect. This
means that a positive (negative) regression coefficient on the log-odds scale
represents an increase (decrease) in fixation proportions. However, as seen

in Figure 8.22, the relationship between log odds, odds, and proportions is
nonlinear. The regression coefficient bi indicates the change in log odds of
the dependent variable Y if predictor Xi increases by one unit (Field, 2018).
To report the corresponding difference on the odds or the proportion scale,
a backtransformation will be necessary (see Liberman, 2005, for details).
The results of logistic regression are thus interpreted similarly as in linear
regression, but in relation to a logit-transformed dependent variable.
Although the logit successfully removes the boundaries in proportion data,

it cannot handle perfect scores of 0 or 1. This is a mathematical problem.10 To
mitigate this problem, researchers can aggregate fixation proportions from mul-
tiple trials so they are less likely to have aggregate scores of 0 or 1. For instance,
in his initial logistic-regression theory paper, Barr (2008) conducted separate
by-subject and by-item analyses, meaning the data were aggregated over items
in one analysis and aggregated over participants in the other analysis (see Section
8.4.1). Even after data aggregation, however, some observations with a perfect
proportion may remain. In a true logistic regression, these observations will be
removed from the analysis, which will inevitably reduce the amount of statistical
information in the data.
To avoid this issue entirely, researchers could perform a quasi-logistic regres-
sion as a close alternative to logistic regression (see Table 8.8). Quasi-logistic
regression is a linear regression performed on fixation data that have undergone
an empirical logit or elog transformation (Barr, 2008; Jaeger, 2008; Mirman,
2008). The elog is defined as follows (also see Table 8.7):
elog = ln  looks + 0.5  or elog = ln  SumFix + 0.5 

 non-looks + 0.5   N − SumFix + 0.5 
Unlike with logistic regression, where the software takes care of the logit
transformation, researchers need to do the elog transformation themselves. To
do so, one can simply add a 0.5 adjustment factor to both the numerator and the
denominator of the odds formula. Then take the natural logarithm as one would
for the logit. This will yield the elog value that is used for analysis. A handful of
L2 and bilingualism researchers have successfully applied this approach to eye-
tracking data: see Mitsugi and MacWhinney (2016) for an example growth curve
analysis and Cunnings et al. (2017), Dijkgraaf et al. (2017), and Ito et al. (2018) for
sample elog analyses on separate time bins.
Once you have your elog variable, you can use general linear mixed-effects
modeling to analyze the data. These are the same models that were described
previously in Section 8.4. In contrast, when using true odds for logistic regression,
researchers should opt for a generalized linear mixed-effects model with a logistic
TABLE 8.8 Comparison of logistic and quasi-logistic regression
Logistic regression Quasi-logistic regression

Dependent  looks   looks + 0.5 
logit(looks) = ln  elog(looks) = ln 
variable  non-looks   non-looks + 0.5 
Link function Yes, the software applies a logit No, the researcher prepares the
link function dependent variable and then a
regular regression is used
0s and 1s Removed Included
Analysis GLMM LMM
Computing Long when there are random Faster
time effects in the model
(binomial) link function; that is, a logistic mixed-effects model. Doing so will
cause the software to apply the logit link function (so you do not need to calcu-
late the log odds yourself), but it will also increase processing time if the model
contains random variables (see Mirman, 2014, for technical details). One practical
consideration, therefore, might be that LMMs based on elog take relatively less
time to compute. As you will see in the example that follows (see Section 8.5.2.5),
the conclusion should be very similar, if not the same, regardless of which out-
come variable (logit transformed or elog transformed) researchers select.Table 8.8
summarizes the main differences between the two approaches.
8.5.2.4 Choosing Time Terms

Growth curve models are designed to capture longitudinal data. Therefore, a
defining feature of growth curve analysis is that the models include “Time” as
a predictor. How does performance change over time or, in eye-tracking terms,
how do participants’ gaze patterns change as they hear more and more of the lin-
guistic input unfolding? By entering Time as an independent variable in a growth
curve analysis, the researcher will be able to address this question and other ques-
tions that depend on it.
As it happens, changes over time are often nonlinear. They are not represented
well with a straight line. In these cases, having only Time as a predictor in the
model may not be sufficient because this assumes a linear relationship. A simple
way to think about this is in terms of the growth curves (the graphs) that I intro-
duced in Section 8.5.2.2. A statistical model should capture the overall trends, or
functional form, of the graphs (Mirman, 2014). This is illustrated for a part of
Chepyshko’s (2018) data in Figure 8.23.11
Having a linear Time–outcome relationship means we can draw a straight
line through the data points and this will capture the data well (i.e., most of
FIGURE 8.23
Visual inspection of behavioral data. (a) A straight line does not fit
the data well, but (b) a curved line, or (c) a curvilinear line does. This
suggests a need for higher-order polynomials (Time2 or Time3) in the
statistical analysis.
the observations will fall close to the line, in a random pattern). In the case of
Chepyshko’s data, as in many visual world studies, a straight line actually misses
a fair number of data points. There are systematic deviations in the data pat-
tern, which come from an upward shift in observations (especially for the Target)
toward the end of the time segment. If you see this type of curvature in the data,
a curvy line with one or more bends may be better suited to account for your
data. Indeed, the lines in the mid and bottom panel capture the shape of the data
much better. One option to implement this pattern statistically is by introduc-
ing higher-order Time terms (e.g., quadratic Time or cubic Time) in the analysis.
These terms are sometimes referred to as higher-order polynomials, which is
basically a fancy term for a predictor that has been raised to a power higher than 1.
The basic idea is that the polynomial order of your predictor (e.g., Time1,
Time2, Time3) should equal the number of bends in the line + 1. Recognizing
what kind of curve you have may require some practice at first, but as you gain
experience looking at graphs, you will find it easier to spot the turns or inflec-
tion points in the data. Straight lines have no bend. Therefore, they can be mod-
eled with a linear Time predictor: Time1 = Time. When the line has one bend
(one point of inflection), a quadratic term can be considered: Time2. For lines
that change direction twice (i.e., S-shaped curves), a cubic term could be appro-
priate: Time3. In theory, one could keep adding higher-order terms to the model
until the fitted line and the observed line perfectly match. This would be overfit-
ting the data, however, and would limit the generalizability of findings.12 To avoid
overfitting and difficulty in interpretation, it is recommended that researchers
do not go beyond a third-order (Barr, 2008) or a fourth-order (Mirman, 2014)
Time term.
When entering more than one Time term into a statistical analysis, predic-
tors will be highly correlated: as linear Time increases (Time = 1, 2, 3, 4, …),
quadratic Time (Time2 = 1, 4, 9, 16, …) and cubic Time (Time3 = 1, 8, 27,
64, …) increase as well. Thus, one shortcoming of natural polynomials is that
they violate the non-collinearity assumption in regression analysis (Field, 2018;
Larson-Hall, 2016). When predictors are highly correlated, the overall model is
still valid but the results for individual predictors can no longer be interpreted
independently. To address this issue, researchers can use orthogonal (uncor-
related) polynomials instead. With orthogonal polynomials, the Time variable
has been rescaled and centered around the mean (Mirman, 2014) such that all the
polynomials are on the same scale and, more importantly, the resulting time values
are independent (see Figure 8.24). Researchers are then able to inspect the dif-
ferent components of a growth curve independently—the steepness of the slope
(regression coefficient for Time) and the sharpness of the curvature (regression
coefficient for Time2 and/or Time3). Thus, for many research designs, orthogonal
polynomials will be preferred (Mirman, 2014).
Orthogonalizing polynomials does affect the interpretation of the inter-
cept b0. In a visual-world eye-tracking experiment, the intercept will normally
FIGURE 8.24 Natural and orthogonal polynomials.

correspond to the likelihood of looks at the target image (or any other chosen
image) at the outset of the time window, or Time 0. The intercept value could
be informative for identifying anticipatory baseline effects (Barr, 2008; Barr,
Gann, & Pierce, 2011); that is, visual biases in participants’ looking behavior before
they have heard the critical part of the input (for more information, see Sections
6.3.1.1, 6.3.1.2, and 6.3.2.2). Such effects can then be distinguished statistically
from rate effects (the coefficients for the different Time predictors), which
reflect the influence of the linguistic signal proper (Barr, 2008; Barr et al., 2011).
In models with orthogonal polynomials, on the other hand, the intercept signifies
differences between the conditions’ overall means (due to the centering) rather
than the differences between the experimental conditions at Time 0. Therefore,
when it is important to test for differences at Time 0 (e.g., to rule out preexisting
differences between groups or conditions), natural polynomials can be employed;
otherwise, orthogonal polynomials may prove a more informative and more fine-
grained approach.
8.5.2.5 Worked Example
In this section, we have unpacked growth curve analysis into its different compo-
nents. To recapitulate, growth curve analysis of visual-world eye-tracking data is a
mixed-model (Section 8.4), logistic or quasi-logistic regression (Section 8.5.2.3)
that can accommodate nonlinear growth over time (Section 8.5.2.4). Now, the
time has come to combine these different building blocks and apply them to the
analysis of a real-world example.We will re-analyze fixation data from Chepyshko
(2018) for the 0–850 ms time window (see Section 8.5.2.2).
Recall that Chepyshko was interested in whether L1 and L2 English speak-
ers can use the semantics of locative verbs (e.g., to fill, to pour, to spill, to stain) to
predict the verb argument. The present analysis focuses on the L1 speakers’ data
for two verb types: Content-oriented Verbs, which take a Content as a direct
object (e.g., to spill the coffeeContent on the t-shirtContainer) and Container-oriented Verbs,
which go with a Container as the direct object (e.g., to stain the t-shirtContainer with
coffeeContent). Thus, when “spill” or “stain” are presented together with images of a
coffee and a t-shirt (see Figure 8.20), each verb has a clear, distinct Target. The
goal of the present analysis is to test whether native English speakers are able to
anticipate that Target early on during processing, as reflected in their looks to the
Target image during the verb segment.
For the present analysis, I started off with a base model that included only
the linear and quadratic Time terms (orthogonalized to remove collinearity, see
Section 8.5.2.4). I opted for a U-shaped (quadratic) time curve based on the
results of the visual inspection shown in Figure 8.23. Both the quadratic and the
cubic curves captured the data well; therefore, I went with the simpler solution.
Note that the base model, and the statistical significance of the Time terms in it,
indicates whether, averaged across the two verb types, participants started looking
at the Target more over time. With three images on the screen (see Figure 8.20),
a .33 proportion of on-target fixations reflects chance performance. Furthermore,
because all the sentences followed the same structure (The Agent will Verb the
Object 1 Preposition Object 2), more looks could be directed to the Agent (e.g.,
the cook) in the early parts of the sentence.
Next, I added fixed effects for Verb Type in three steps. In Model 0,Verb Type
was added as a main effect. In Model 1, Verb Type was allowed to interact with
Time linear. And lastly, in Model 2, Verb Type could also interact with Time
quadratic (see Tables 8.9 and 8.10). This forward, stepwise approach is similar to
Mirman’s (2014) sample analysis with nonlinear Time terms. It combines a given
set of Time terms that are selected a priori based on visual inspection with model
comparison.
Tables 8.9 and 8.11 summarize the findings from the model comparison for
the logistic and quasi-logistic regression. Results for the two types of analysis
converged: in both cases, adding interaction terms to the model (i.e., allowing
fixations to develop differently over time for the two types of verb) signifi-
cantly improved model fit. Whereas AIC values were similar for the base model,
Model 0 and Model 1, they went down substantially for the Model 2 logistic
regression. (Recall that smaller AIC values are better.) Furthermore, the LRT
returned a significant p value for Model 2 in both the logistic and the quasi-
logistic regression analysis. Thus, the most complex model (Model 2) was also
the best one.
In light of the similarities between the two analyses, we will focus on the results
for the logistic regression for the remainder of this section (but bear in mind that
the empirical logit regression produced very similar results). Table 8.11 presents
the detailed results for Model 2. The significant interaction terms indicated that
listeners responded differently to Content and Container verbs.To understand the
nature of this interaction, I ran two follow-up analyses, in which I analyzed gaze
patterns for each verb type separately. Results showed that listeners were faster to
orient to Content objects while hearing Content verbs than when they looked at
Container objects while hearing Container verbs.13
Figure 8.25 plots the predicted values for the two types of verb, transformed
back onto the proportion scale, for ease of interpretation. We see that in this early
time window, only Content verbs exerted an influence on participants’ gaze pat-
terns. Chepyshko (2018) attributed the processing differences for Content and
Container verbs to inherent differences in the verbs’ conceptual representations.
Specifically, he posited that the perceptual and motor correlates of actions such
as spill and stain align more closely with the verb argument structure of Content
verbs in that they assign a central role to the content being spilled or causing
a stain. This, in turn, may facilitate processing of content verbs more so than
container verbs (Chepyshko, 2018). Although the present analysis focused on L1
listeners’ data only, the L2 English speakers in Chepyshko’s study showed a similar
viewing pattern.
TABLE 8.9 Forward model selection in logistic regression: A base model and three competitor models
Outcome variable Fixed effects p AIC R2

Log Odds of Looks Time Time2 VerbType VerbType × Time VerbType × Time2 R2m R2c
to Target
Base model x x x 113933 0.03 0.43
Model 0 x x x x .28 113934 0.03 0.44
Model 1 x x x x x .05 113932 0.03 0.43
Model 2 x x x x x x <.001 113912 0.03 0.43
 looks ( Target ) 
Note: Outcome variable = ln   ; Time2 = quadratic Time term.
 looks (Other ) 
(Source: Based on Chepyshko, 2018).
TABLE 8.10 Forward model selection in quasi-logistic regression: A base model and three competitor models
Outcome variable Fixed effects p AIC R2

Elog of Looks to Target Time Time2 Verb Type VerbType× Time VerbType× Time2 R2m R2c
Base Model x x x 41372 0.03 0.25
Model 0 x x x x .37 41374 0.03 0.25
Model 1 x x x x x .14 41373 0.03 0.24
Model 2 x x x x x x .04 41371 0.03 0.24
 looks ( Target ) + 0.5 

Note: Outcome variable = ln   ; Time2 = quadratic Time term.
 looks (Other ) + 0.5 

TABLE 8.11 The final model (Model 2) in the logistic regression analysis
Logistic with quadratic and linear Time terms
Fixed effects B SE z p
Intercept –1.36 0.21 –6.58 <.001
Time (linear) 2.00 0.50 4.01 <.001
Time (quadratic) 0.06 0.05 1.32 .19
VerbType –0.14 0.16 –0.84 .40
Time (linear) × VerbType –1.30 0.48 –2.70 .007
Time (quadratic) × VerbType –0.42 0.09 –4.74 <.001
(1|subject) 1.79 1.34
(Time|subject) 9.39 3.06
(1|verb) 0.08 0.28
(Time|verb) 0.66 0.82
Residual 12.87 3.59
R2 marginal/R2 conditional 0.03/0.43
AIC 113912
FIGURE 8.25 Fitted (predicted) values for looks to target in the early time window.
Note: the horizontal dotted line represents chance-level looks at the
Target image. Target looks rose above chance from about 625 ms post
verb onset, but only for the Content Verbs.
8.5.2.6 Reporting the Results

The following text is a possible model for how to report the results from a growth
curve analysis, using the preceding statistical analysis as an example:
To examine gaze patterns, I conducted a logistic growth curve analysis of

listeners’ eye fixations over the duration of the verb. The time window
was defined from the onset to the offset of the verb and spanned 850 ms.
Fixation trajectories were modeled with orthogonal, quadratic Time terms,
justified by visual inspection of the data. The model also included a fixed
effect of Verb Type (Content vs. Container, simple-effects coded) and its
interactions with all Time terms. The Verb Type effects were added in a
forward, stepwise approach. Model comparison, using the log-likelihood
ratio test (LRT) with α at .05, was used to determine the best-fitting model.
To determine the significance of individual predictors, I used the normal
approximation (*|z| > 1.96). The maximal random-effects structure that
converged for the present analysis included by-participant and by-verb ran-
dom intercepts and random slopes for the linear Time term. All analyses
were performed in R (Version 3.4.2) using the glmer () function in the
lme4 package (Version 1.1–17). Models were fit using restricted maximum
likelihood (REML).
Results for the best-fitting model are presented in Table 8.11. Detailed
model comparisons are reported in the Appendix [i.e., Table 8.9]. The
final model included significant Verb Type by Time interactions, for both
linear Time (b = −1.30, SE = 0.48, z = −2.70, p = .007) and quadratic Time
(b = −0.42, SE = 0.09, z = −4.74, p <.001).These interactions indicated that
listeners responded differently to Content Verbs and Container Verbs. I fol-
lowed up on the interactions by running growth curve models for each
verb type separately. Growth in on-target eye fixations was stronger and
consistently positive for the Content Verbs: Time linear (b = 3.09, SE = 0.69,
z = 4.47, p < .001) and Time quadratic (b = 0.17, SE = 0.07, z = 2.32,
p = .02). For the Container Verbs, the growth captured in the linear Time
was not as strong (b = 1.60, SE = 0.78, z = 2.04, p = .04) and the quadratic
Time term revealed a negative growth (b = −0.45, SE = 0.06, z = −7.5, p <
.001), suggesting on-target fixations leveled off over time. Thus, Content
Verbs elicited a faster reaction toward the target than Container Verbs. As
shown in Figure 8.25, participants predicted the correct verbal argument
for Content Verbs with above-chance accuracy from about 625 ms onwards.
No evidence for prediction was found for Container Verbs in the current
time window.
8.6 Conclusion: Which Analysis Should I Use?

This chapter has provided an overview of the major statistical options that are
available to eye-tracking researchers. A statistical analysis can only be as good as
the data that go into it. Therefore, before considering any statistical procedures
at all, researchers need to ascertain the quality of their data. In Section 8.1, I
detailed the different steps in data cleaning for doing so: a trial-by-trial inspection
of individual participant records (see Section 8.1.2) and correcting recordings
for drift, if necessary, desirable, and possible, given the study design (see Section
8.1.3). Closely related to the issue of data cleaning is the treatment of outliers (see
Section 8.2). Eye-movement records may contain biological outliers (fixations
that are too short or too long to be reflective of cognitive processing) and statisti-
cal outliers (fixations that are unrelated to the phenomenon under study). Both
require some sort of action from the researcher. In Section 8.2.1, I confirmed
that the proposed cut-offs for deleting fixations from the L1 literature—50 to
100 ms and 800 ms, respectively—may also apply to L2 speakers and bilinguals.
A four-stage outlier treatment procedure was then presented in Section 8.2.3,
which highlighted the possibility of identifying statistical outliers after data analy-
sis. Regardless of the chosen route, it is important that data be in the appropri-
ate distribution (e.g., a normal distribution for continuous variables) before one
proceeds with outlier detection. This is an area that is ripe for improvement, as
the current review revealed only one quarter of eye-tracking researchers in SLA
and bilingualism have reported performing a data transformation or confirming
(by checking the normality of their data) that a transformation was not necessary
(see Section 8.2.2).
When it comes to choosing a statistical analysis, the nature of the dependent
variable is key. Figure 8.26 summarizes the different paths researchers may take to
statistical analysis, depending on whether they have binary-categorical, count, or
continuous data. For the largest category of eye-movement measures, eye-fixation
durations, ANOVA and linear regression are appropriate statistical techniques,
provided the outcome measure is normally distributed. As previously noted, this
will generally require doing a logarithmic transformation first. Unfortunately,
ANOVA has been overused with bounded continuous variables such as fixation
proportion (0–1) or regression probability (%), at the risk of producing spuri-
ous (invalid) results (see Section 8.5.2.3). To avoid these conceptual and statisti-
cal issues, researchers can calculate the empirical logit for their data (see Section
8.5.2.3), which will be an unbounded, continuous variable suitable for ANOVA
or regression. Another solution is to deaggregate proportions or probabilities into
item-level, binary data, whereby each observation is represented as “0” or “1” on
a separate row in the spreadsheet. Binary data lend themselves to a logistic regres-
sion analysis—a linear regression in log-odds space (see Section 8.5.2.3). A similar
solution is available for count measures (Cameron & Trivedi, 1998; Hilbe, 2007).
FIGURE 8.26 A menu of common statistical options in eye-tracking research.
In recent years, eye-tracking researchers have also brought statistical innova-

tions to their fields, as increasing numbers of scholars have adopted (generalized)
linear mixed-effects models (see Figure 8.13). Mixed-effects models are powerful
and flexible analytic techniques: for every type of regression model (e.g., a linear
model or a generalized linear model), there is a mixed-effects regression counter-
part. These mixed-effects models have a place in eye-tracking research, as studies
commonly include multiple observations per participant and per item. One aim
of this chapter, then, was to advance the use of mixed-effects modeling in SLA
and bilingualism by introducing key concepts and a worked example (see Sections
8.4 and 8.5).
Although it pays to invest in these newer techniques, at the end of the day, clar-
ity and completeness, rather than complexity of analysis should be the priority in
eye-tracking research. Inspecting descriptives, checking assumptions, and visual-
izing the data remain cornerstones of any statistical analysis, whether univariate
or multilevel logistic. As researchers, we want to communicate our results with
our readership in a clear and accessible manner and for that, data-rich graphics
coupled with a full reporting of the findings are essential.
Notes
1 Track loss can be calculated at other levels as well: for instance, track loss at the trial
level or the item level. To determine the amount of track loss, researchers calculate the
percentage of raw eye data samples with missing position information; that is, the per-
centage of rows in the spreadsheet with missing values for the x, y screen coordinates.
2 Transforming a variable will not affect the significance of results (beyond increasing
power or reducing Type I error), but it will have implications for the interpretation of
findings. Specifically, any results are to be interpreted in relation to the transformed
variable, for instance log duration or log latency.To interpret key results on the original
scale, researchers can backtransform their estimates, using an exponential function in
the case of the log transformation.
3 In repeated-measures ANOVA, the within-subject variable (e.g.,Time [Pretest, Posttest,
Delayed Posttest]) is partitioned into a fixed effect and a random effect.The fixed effect
is what is normally reported in research articles; however, the random effect is also
estimated as a part of the statistical output (Field, 2018). Specifically, the random effects
component indicates how the repeated measurements affected individual participants
or items differently (random slope) and how the participants or items differed at Time 1
(random intercept).
4 Godfroid and Uggen were early adopters of LMMs in L2 research, but like many
researchers at the time, they analyzed their data with by-subject random intercepts
only. Here, I will show what the results look like when a more complex random effects
structure is used.
5 When performing a LRT in R using the anova() function, the default is to refit models
with the maximum likelihood algorithm. Since the present demonstration focuses on
how to compare models with different random effects structures, I disabled the refit
functionality in the code. Interested readers could refer to Cunnings and Finlayson
(2015) for a more detailed discussion.
6 One thing to look out for in the random effects structure is a perfect intercept-slope
correlation (r = 1 or r = -1). Perfect correlations indicate that there were too many
parameters in the model; in other words, the model was overparameterized and the
random effects structure needs to be simplified (Bates, Kliegl,Vasishth, & Baayen, 2015;
Matuschek et al., 2017).
7 The exact length of the verb window differed from trial to trial, but in all cases the
window was defined so it included only the verb. For additional examples and discus-
sion of how to set time windows, see Section 6.3.2.2.
8 Because proportions and odds are non-linearly related, some differences in patterns
will occur for proportions smaller than .30 or larger than .70.
9 For instance, a .80 fixation likelihood corresponds to an odds of .80/.20 = 4/1 = 4
(often read as “four-to-one odds”) and a log odds of ln(.80/.20) = ln(4) = 1.386. A .20
fixation likelihood equals a ¼ (one-to-four) odds and a -1.386 log odds.
10 Both ln(0) (the logit for p = 0) and ln(∞) (the logit for p = 1) are undefined.
11 Note that because we are moving into the realm of inferential statistics, I used the
elog data, rather than proportion data in the graph, in preparation for the subsequent
empirical logit analysis.
12 To test the appropriateness of the chosen solution, researchers can engage in model
selection (see Sections 8.4.3 and 8.5.2.5). In a model-selection approach, a model with
a higher-order term will be retained only if the model yields a significantly better fit
than a subset model that does not include the same term.
13 This finding was supported by the larger and consistently positive Time effects for
ContentVerbs, which signalled a stronger growth in on-target fixations:Time linear (b =
3.09, SE = 0.69, z = 4.47, p < .001) and Time quadratic (b = 0.17, SE = 0.07, z =
2.32, p = .02). For Container Verbs, the growth captured in the linear Time was not as
strong (b = 1.60, SE = 0.78, z = 2.04, p = .04) and the quadratic Time term revealed a
negative growth (b = -0.45, SE = 0.06, z = -7.5, p < .001), suggesting on-target fixa-
tions levelled off over time.
9
SETTING UP AN EYE-TRACKING LAB
This chapter provides you with information to get started on your own eye-
tracking research. It focuses on the central piece of equipment, the eye tracker (see
Section 9.1), and the physical and social space of the eye-tracking lab (see Section
9.2). To jumpstart the research process, I also present ten ideas for research, based
on actual studies with and without eye tracking (see Section 9.3.1). A list of tips
and tricks summarizes the key points from previous chapters and concludes the
book together with some additional, hands-on advice (see Section 9.3.2).
9.1 Choosing an Eye Tracker

9.1.1 Types of Eye Trackers and Their Precursors
Over the past 140 years, people have developed and experimented with a variety
of methods to record the movements of the eye. The first evidence that reading
consists of a series of fixations and saccades was obtained by techniques that ena-
bled researchers to hear, rather than see, the eye move (Wade, 2007). For instance,
Hering (1879) used rubber tubes to listen to the sound of the eye muscles,
whereas Lamare (1892) converted eye movements into drumbeats by connecting
the eyelid to a drum via a small tube. Attempts at visualizing eye movements soon
followed, and included the use of eye caps and mirrors (Wade, 2007). A break-
through, however, came with Raymond Dodge (Dodge, 1903, 1904; Dodge &
Cline, 1901) who decided to photograph the reflection of a light from the surface
of the eye (see Wade & Tatler, 2005, for an illustrated historical review). Figure 9.1
depicts the photographic device that Dodge designed for recording eye move-
ments, while Figure 9.2a is a record of one participant’s eye movements on two
trials. A similar representation is still used today by the research team at Potsdam
University (see Figure 9.2b).
312 Setting up an Eye-Tracking Lab
FIGURE 9.1
The Dodge photochronograph. Corneal reflection was recorded on
a slowly falling photographic plate positioned behind a 1.5 m long
enlarging camera.
(Source: Reprinted from Diefendorf, A. R., & Dodge, R., 1908. An experimental study of the ocular
reactions of the insane from photographic records, Brain, 31(3), 451–489, with permission from
Oxford University Press).
It is interesting that several of these early approaches foreshadowed current eye-

movement recording techniques. For instance, the idea of mounting an external
object directly onto the eye evolved into the contact lens method (Duchowski,
2007; Eggert, 2007;Young & Sheena, 1975) and Dodge’s and subsequent research-
ers’ photographic techniques led the way for video-based eye tracking. A third
approach, known as electrooculography, was first adopted in the interbellum (e.g.,
Jacobson, 1930; Meyers, 1929; Schott, 1922). Most research discussed in this book
was conducted with video-based eye trackers; hence, this is the technique on
which we will focus the most. The other recording methods are also still in use
today, but have more specialized applications.
Scleral contact lenses are used primarily in clinical research and studies
about the physiology of the eye. Scleral contact lenses, also referred to as search
coil systems, provide very precise measurements of the eye by measuring electro-
magnetic induction in two or three search coils embedded in a lens (see Figure
9.3). The lenses, however, are invasive. Often the participant’s eye needs to be
anesthetized before inserting them. Thus, the search coil system seems better
suited for use with small samples of trained participants.
Electrooculography is often used in sleep research because it does not require
the participant to have her eyes open during recording. This is because electrooc-
ulography measures changes in electric potential at the skin level that result from
Setting up an Eye-Tracking Lab 313
FIGURE 9.2 (a) (left) Drawing based on one healthy participant’s photographic record.
Each trial began at the bottom of the record.Vertical lines represent eye
fixations and horizontal lines represent eye movements. One dash equals
10 ms. (b) (right) Eye-movement trace for the sentence “Sometimes
victims do not tell the complete truth in court”. The trial begins at the
top of the record (Time = 0 ms).Vertical stretches of the dark black line
are fixations and horizontal stretches are saccades.
(Source: (a) Reprinted from Diefendorf, A. R., & Dodge, R., (1908). An experimental study of the
ocular reactions of the insane from photographic records, Brain, 31(3), 451–489, with permission from
Oxford University Press. (b) Engbert, R., Nuthmann, A., Richter, E. M., & Kliegl, R., 2005. SWIFT:
A dynamical model of saccade generation during reading. Psychological Review, 112(4), 777–813, APA,
reprinted with permission).
FIGURE 9.3 A scleral contact lens. Torsion coil inserted on the eye with the thin wire
exiting nasally.
(Source: Chronos Vision. Reprinted with permission).
eye rotations (Duchowski, 2007; Eggert, 2007; Young & Sheena, 1975). It uses a
set of electrodes placed around the eyes to do so (see Figure 9.4). Holmqvist et
al. (2011) noted that electrooculography is “a low-cost variety of eye tracking” (p.
10), but it is not as accurate and precise as other eye-tracking systems (Wade &
Tatler, 2005).
Finally, video-based eye trackers have a wide range of applications and under-
lie most, if not all, of the language-processing research. Video-based eye tracking
has also become more affordable since it first became available in the mid-1970s.
The principle behind video-based eye trackers is to detect one or more landmarks
on the eye (e.g., pupil, limbus, iris, light reflection in the cornea; see Figure 9.5)
FIGURE 9.4
Electrooculography. Four input channels around the eye record
information about horizontal and vertical eye movements. Reprinted
with permission.
(Source: Metrovision).
FIGURE 9.5 Schematic representation of the eye.

FIGURE 9.6 Relative positions of the pupil and the corneal reflection for different
points of regard.
(Source: Reprinted from Duchowski, A. T., 2007. Eye tracking methodology: Theory and practice. London:
Springer, with permission of Springer Nature).
and infer the point of gaze using geometrical principles.The most commonly used
method combines pupil and corneal-reflection tracking, which is similar to
how Dodge’s photochronograph worked. While the pupil is easily recognizable
on film as the dark area in the center of the eye, the corneal reflection is caused
by a (near) infrared light beamed from the eye tracker. The light will reflect from
the front of an individual’s cornea, producing the so-called corneal reflection or
glint (see Figure 9.5), also known as the first Purkinje image (see Textbox 9.1).
Because the cornea has a higher curvature than the eyeball, the relative positions
of the corneal reflection and the pupil center will change with eye position (see
Figure 9.6). Point of gaze estimation, then, is based on a vector of the angle between
the corneal reflection and the pupil center, and other geometrical calculations.
TEXTBOX 9.1. PURKINJE IMAGES

Purkinje images are the reflections of objects from the curvatures of the eyes.
There are four images that are visible: the first Purkinje image (P1) is a reflec-
tion from the front surface of the cornea and is often used to determine the
accuracy of eye fixation; the second Purkinje image (P2) is reflected from the
back surface of the cornea; the third Purkinje image (P3) is a reflection from
the front surface of the lens; and the fourth Purkinje image (P4) is a reflection
from the back surface of the lens.
This explains how pupil and corneal-reflection eye trackers work as a family. We
now consider the different hardware set-ups available for these eye trackers, which
are important to understand the properties of the system (Holmqvist et al., 2011).
9.1.2 Video-Based Eye Trackers

In general, we can distinguish between static and mobile eye trackers. Static eye
trackers are appropriate for laboratory research, broadly construed here as any
study that can be conducted inside a research lab. If it is important that data be
collected in the real world, for example the language classroom or the playground,
then a mobile eye-tracking solution is necessary (more information on mobile
eye trackers is provided in the next section). Static and mobile eye trackers can
be mounted in different ways and are sometimes described accordingly. Remote
eye trackers have the eye camera and infrared light source placed at some distance
from the participant: either inside the display monitor (see Figure 9.7) or on the
table in front of the participant (desk-mounted) (see Figures 9.8 and 9.9).
In contrast, head-mounted eye trackers are worn in close proximity to the
head, because they are attached to a headband or a cap, which further increases the
flexibility of their use (see Figure 9.10). Eye-tracking glasses are a special type
of head-mounted eye trackers (see Figures 9.11 and 9.12). Although most remote
and head-mounted systems are mobile, there are a few more static examples as
well. For instance, eye trackers that are used in conjunction with two PCs tend
to be harder to transport. Finally, tower-mounted eye trackers are by definition
static (see Figures 9.13a and 9.13b). They offer the most precise and accurate data
FIGURE 9.7 A remote eye tracker with the camera inside the display monitor.
(Source: Tobii Pro TX300).
FIGURE 9.8
A remote eye tracker with the camera on the table in front of the
participant.
(Source: SR research Ltd. Eyelink 1000).
FIGURE 9.9
A remote eye tracker with the camera on the table in front of the
participant.
(Source: Applied Science Laboratories EYE-TRAC 7).
FIGURE 9.10 A head-mounted eye tracker.
(Source: SR research Ltd. Eyelink II. Photography supplied courtesy of The Center for Comparative
Psycholinguistics at University of Alberta).
FIGURE 9.11 Eye-tracking glasses.

(Source: Tobii Pro Glasses 2. Photography supplied by the Laboratory of Language & Cognition,
Pontificia Universidad Católica de Valparaíso, Chile).
FIGURE 9.12 A head-mounted mobile eye tracker.

(Source: De Beugher, Brône, and Goedemé, 2014. Reprinted with permission of SCITEPRESS).
FIGURE 9.13 (a) A remote eye tracker mounted in a tower above the participant’s head.
This is the same eye tracker as in Figure 9.8, but mounted differently. (b)
A hi-speed tower-mount eye tracker.
(Source: (a) SR research Ltd. Eyelink 1000. (b) Picture reproduced with kind permission from
SensoMotoric Instruments GmbH).
among the pupil and corneal-reflection eye trackers because they gently restrict
head movement (Holmqvist et al., 2011). In sum, there appears to be a tradeoff
between flexibility of use and freedom of movement, on the one hand, and data
accuracy and precision, on the other.
Some head-mounted eye trackers and some head-free, remote eye track-
ers include head tracking (i.e., tracking of head movements) as a standard or
optional feature of the equipment. Head tracking makes it possible to compensate
for small head movements when computing eye gaze position so researchers can
still obtain high-quality data without paying the cost of restricting participants’
head movements. Head trackers use either optical reflectors or magnetic sensors
to measure the position of the head in space. The optical reflector is an infrared
reflector (marker) that is attached on the participant’s forehead to measure precise
head motions (see Figure 9.14).
A magnetic head-tracker (see Figure 9.15) is composed of a magnetic field
generator and a head sensor, usually mounted on the cap.The two combined yield
absolute locations and movements of the head (Stephane, 2011). When this head
position data is added to the gaze position data extracted from an eye tracker, the
generated head-eye gaze vector will allow automatized data analysis that accounts
for head movements (Holmqvist et al., 2011). However, despite such options for
rigorous data acquisition, many users prefer risking low data quality over placing
sensors or markers on participants: a trend that is evident from a low market share.
With such a variety in eye-tracking models, what type of eye tracker should you
choose? This will depend largely on your research needs, your participant popula-
tion, and your budget. Here we focus on research needs and participants, but some
FIGURE 9.14 A remote head-tracker with a target sticker.
FIGURE 9.15 A magnetic head-tracker.

tips for start-up packages and grant applications can be found in Section 9.2.1 and
in Sanz, Morales-Front, Zalbidea, and Zárate-Sández (2016). In choosing an eye
tracker, it is important to think ahead to how you are planning to use the machine:
will you conduct research on reading and writing, task-based language teaching
(TBLT), interaction and feedback, the bilingual lexicon, language assessment, com-
puter-assisted language learning (CALL), or translation processes (see Chapters 3
and 4)? Will you have small interest areas, at the word level or below, or will your
regions of analysis be larger (see Section 6.1)? Depending on your answers, the eye
tracker you need will have to be more or less sophisticated and flexible in what it
can do. Of course, both sophistication and flexibility are desirable features in an eye
tracker and manufacturers are working hard to develop machines that combine the
two, yet most present-day machines compromise on one of these two dimensions.
Mobile eye trackers are typically head-free, which makes them more suitable for
more natural tasks and research with children and clinical populations, but they
tend to be slower and less accurate and precise. As a result, we will often use them
to address more coarse-grained research questions. Static eye trackers (either head-
free or with head stabilization) provide a higher degree of accuracy and precision in
data collection and are often faster, which allows them to address more fine-grained
research questions and questions about word and sublexical processing, yet they
offer less ecological validity. Figure 9.16 represents some of the questions about
eye-tracker use that can guide researchers toward either a mobile or a static model.
FIGURE 9.16 Flow-chart for deciding on an eye-tracking solution.

In this respect, one thing to bear in mind is that by choosing an eye tracker
that suits one’s needs, researchers can prevent substantial data loss and have greater
confidence in their findings. Using an eye tracker with head stabilization even
in contexts where eye-movement data are often recorded head-free (e.g., visual
world paradigm) will not have negative consequences for your research, but there
are contexts where head-free eye-movement recordings are inappropriate and
will produce data that are unusable. In SLA and applied linguistics, research that
has typically been conducted using head-free equipment includes, but is not lim-
ited to, studies with children and clinical populations; visual world studies; testing
research; research on TBLT; reading, writing, and translation research with larger
text units; CALL studies; and research on interaction and feedback (see Chapters
3 and 4, for a synthetic review). Conversely, a static eye tracker with head stabili-
zation is appropriate, and may even be necessary, for any type of research that has
interest areas at the word or morpheme level, which typically includes reading,
writing, and translation research and sentence-processing studies, but also research
in other areas (testing, TBLT, CALL, feedback), depending on the question under
investigation.
9.1.3 How Does an Eye Tracker Work?

Speed, Accuracy, and Precision
Thus far, we have focused on general types of eye-tracker models and how your
research needs might demand a more or less flexible eye-tracking solution. In
addition to questions of use, eye trackers come with different technical specifica-
tions that will also influence what you can do with your equipment.
An eye tracker (or, more specifically, a video-based eye tracker) is essentially
a fast video camera that takes a large number of snapshots of the eye per second.
The sampling speed of the eye tracker is expressed in Hertz (Hz). Often it is the
number you will find in the name of the eye-tracker model. For instance, a Tobii
TX300 samples eye movements at 300 Hz (300 times per second), the EyeLink
1000 Plus records both eyes at 1000 Hz (1000 times per second), and the iView
2K has a 2000 Hz sampling speed, meaning it takes two pictures of the eyes every
millisecond! The video camera analogy underlines that eye-movement recordings
are not continuous; however, the recordings are made at such high speeds that
they appear to be continuous, much like the illusion of a motion picture is created
by presenting a large collection of static frames in rapid succession.
A consequence of the temporal sampling of eye gaze location is that fixa-
tions and saccades are not inherent to eye-movement recordings. The output of
an eye tracker consists of large collections of raw data samples, as seen in Figure
9.17a, which are nothing more than little blobs on the screen (x, y coordi-
nates). We need an additional computer algorithm to tell us which of these blobs
cluster together in a fixation and which represent a saccade (see Figure 9.17b).
There are two types of event detection algorithms that do this: velocity-based
FIGURE 9.17
Eye-movement data before and after event detection. Figure 9.17a
shows the raw data samples that were collected with a 1,000 Hz eye
tracker. Figure 9.17b shows the data output for the same trial after the
computer algorithm identified fixations (circles) and saccades (lines).
(Source: Data from Godfroid et al., 2015).
algorithms and dispersion-based algorithms (Holmqvist et al., 2011). Velocity-

based algorithms calculate the velocity of the eye between successive data
samples (measured in °/s, see Section 2.2) and use this information to iden-
tify either saccades or fixations. Accordingly, velocity-based algorithms can be
categorized into (velocity-based) saccade detection algorithms and (velocity-
based) fixation detection algorithms. Saccade detection algorithms determine
a saccade occurs when the eye moves faster than a certain velocity, which is
typically somewhere in the 30–100°/s range, depending on algorithm settings
(Holmqvist et al., 2011). As the name suggests, fixation detection algorithms
focus on fixations, which are operationalized as all successive data samples for
which the eye remains below a certain velocity threshold (10–50°/s; Holmqvist
et al., 2011). By default, these two types of algorithms will assign all other eye
behavior toward the other category. This means that non-saccades (i.e., samples
that are not identified as saccades by a saccade detection algorithm) will be classi-
fied as fixations and non-fixations (i.e., samples that are not identified as fixations
by a fixation detection algorithm) will be classified as saccades.1 Holmqvist and
colleagues explain that velocity-based algorithms yield the most accurate results.
SMI Vision, SR Research, and Tobii all use velocity-based detection algorithms
with their eye trackers. However, for a velocity-based approach to work well, a
minimum sampling speed of 200–250 Hz is desirable (Holmqvist et al., 2011).
This is why 250 Hz is generally determined as the lower threshold for high-
speed eye tracking. Slower eye trackers typically use a different type of algorithm,
known as a dispersion-based algorithm.
Dispersion-based algorithms are used for fixation detection only.This type
of algorithm is commonly found in low-speed eye trackers, such as Tobii TX2-30,
although there are exceptions, such as the Tobii TX60, which combines low-
speed (60 Hz) eye tracking with velocity-based fixation detection. Dispersion-
based algorithms calculate the spatial proximity of successive data samples and
ignore any velocity or acceleration-based information. For a period of the eye-
movement record to be identified as a fixation, a number of data samples must
be close to one another (e.g., within a 0.5–2.0° radius; Holmqvist et al., 2011)
for a predetermined time (e.g., 50–250 ms; Holmqvist et al., 2011). As with
velocity-based fixation detection algorithms, saccades are not measured directly,

but their presence is inferred from the absence of a fixation. Because eye trackers
that use fixation detection algorithms do not actually measure saccades, they are
not suited for studying saccade properties.
The previous discussion highlighted one reason eye-tracking speed matters: it
will constrain the type of algorithm your system uses and therefore the accuracy
of your data.The eye tracker you purchase or rent will come with a specific event
detection algorithm that often cannot be changed. However, as a researcher you
can go into the data analysis software and change the default velocity or disper-
sion settings; in other words, you can modify the threshold values for what counts
as a fixation or a saccade. This does not address the previously mentioned limita-
tions of low-speed eye trackers; however, by evaluating your software settings, you
can see for yourself whether the data output post processing, which are the fixa-
tions and saccades you will analyze, captures the raw data samples well.
Roughly speaking, larger blobs of raw samples should be identified as fixations
whereas more dispersed samples probably constitute a saccade (see Figure 9.17).
Any discrepancies between the data pre and post processing are known as algo-
rithmic error. Large and systematic discrepancies indicate that your algorithm
and/or your algorithm settings are not optimally aligned with your research
design and your eye-tracking hardware properties. Researchers who want to fid-
dle around with algorithm settings must proceed with caution, as doing so will
alter the data output and may alter the results. Always compare the new data out-
put against the raw recordings and consult an experienced eye-tracking researcher
or your manufacturer if necessary.
Because fixations and saccades are not observed directly, there are several poten-
tial sources of measurement error in the data. Thus far, we have discussed algo-
rithmic error and how it can be spotted. Next, we consider temporal sampling
error, which is error that results from the eye being sampled at discrete time points.
Because eye-movement recordings are made of a rapid succession of still images,
saccades or fixations will often begin and end in between two recording samples.
This means that the onsets and offsets of fixations and saccades (i.e., their beginnings
and ends) usually do not coincide with the exact timing of the eye tracker. When
the two are out of sync, there will be a small difference between when the saccade
or fixation occurred and when the eye tracker recorded that it occurred. This is
the temporal sampling error. Temporal sampling error will be larger for slower eye
trackers because more time passes in between snapshots and thus, fixations and sac-
cades can remain undetected for longer (see Figure 9.18). Therefore, eye-tracking
speed influences the temporal sampling error for individual fixations and saccades
(lower speed means more error), although the errors will average out when large
amounts of data are collected (Andersson, Nyström, & Holmqvist, 2010).
Figure 9.18 represents the relationship between sampling speed and temporal
error graphically. Let’s assume three eye trackers are sampling a participant’s view-
ing behavior at 50 Hz, 250 Hz, and 500 Hz, respectively. The pegs in Figure 9.18
represent individual snapshots of eye location; that is, all of the times the eye was
filmed. The pegs of a 500 Hz eye tracker follow each other closely, at 2 ms inter-
vals, whereas the pegs of a 50 Hz eye tracker are more dispersed, at 20 ms inter-
vals. The 250 Hz eye tracker is somewhere in between, with pegs spaced at 4 ms
intervals. The three eye trackers are recording a participant’s eye behavior, who
at some point looks at an object. The eye fixation that the participant makes has
a beginning (onset), an end (offset), and a duration, which we would like the eye
trackers to measure as accurately as possible. However, as indicated by the squares
in Figure 9.18, there is some measurement error. The squares represent the differ-
ence between the fixation as recorded by the eye tracker and the fixation that the
participant actually made.This difference is known as the temporal sampling error.
Measurement of fixation onset is delayed until the next sample (peg), by 1 ms, 3 ms,
and 15 ms for the 500 Hz, the 250 Hz, and the 50 Hz eye tracker, respectively.
Similarly, the eye trackers do not register the fixation has ended until the next
sample, which is recorded 1 ms after the true end of fixation for the 500 Hz and
250 Hz eye trackers and 9 ms later for the 50 Hz eye tracker.Therefore, all the eye
trackers overestimate both the beginning and end of the participant fixation, but
the errors are much larger for the 50 Hz eye tracker.
Because fixations have a beginning and an end, both of which are measured by
the eye tracker, fixation duration is a so-called two-point measure (Andersson
et al., 2010). Put differently, fixation durations are bound by two points in time
and both points are filmed by the eye tracker (see Section 7.2.1.2). For another
class of measures, which are known as one-point measures (Andersson et al.,
2010), only one point is measured, usually the time of onset. One-point measures
are latency measures such as fixation latency or time until first visit (see Section
7.2.1.3), saccade latency (how long it takes to initiate a saccade), and time before
timeout. Perhaps somewhat counterintuitively, the average measurement error for
a two-point measure will be smaller than that for a one-point measure. This is
because in two-point measures, both the onset and the offset of the event are
overestimated and so the estimate of the time interval between them will be more
FIGURE 9.18 Temporal sampling frequencies of three eye trackers. Each peg represents
one data sample. Squares indicate the temporal sampling error.
(Source: Modified from Andersson et al., 2010).
accurate. In contrast, nothing compensates for the sampling error associated with
the measurement of a one-point measure. As can be seen in Figure 9.18, latencies
that are measured relative to the beginning of the trial (e.g., time until first visit,
saccade latency) will be overestimated (onset of measurement is delayed until the
next snapshot of the eye), whereas any measures that reference the end of a trial
(e.g., time until timeout) will be underestimated (Andersson et al., 2010).
Using the central limit theorem, Andersson et al. (2010) demonstrated that
the measurement error for two-point durational measures will average to 0 and
the error for one-point latency measures will average to half an eye-tracker sam-
ple (e.g., 10 ms for a 50 Hz eye tracker) given sufficient data. This means that if
you have a large data set, your durational measures will be error-free and your
latency measures can be calculated accurately by subtracting or adding half an eye-
tracker sample. Of course, these guidelines only apply if you have a large data set. If
that is not the case, the sampling error is random and may obscure any true effects
in the data, especially if the effects are small. One solution is to use a faster eye
tracker. As was previously discussed, a faster eye tracker will have a smaller window
of error, and therefore fewer data will be needed to bring the error to its central
tendency (i.e., 0 for durational measures and half a sample for latency measures).
Doubling the sampling speed will bring about a fourfold reduction in data points
needed to maintain the same sampling error (Andersson et al., 2010). For example,
if 100 data points (e.g., 10 trials × 10 participants in the same experimental condi-
tion) are necessary to maintain a < 1ms temporal sampling error with a 250 Hz
eye tracker, only 25 data points will be necessary with a 500 Hz eye tracker, but
as many as 1,600 observations will be necessary with a 60 Hz eye tracker. As a
general rule, the temporal noise in the data set must be well below the magnitude
of the effect you hope to find. Therefore, small effects (e.g., a 15 ms difference
between experimental conditions) require fast eye trackers or very large amounts
of data. For large effects (> 80 ms) eye-tracking speed is not as important.
TEXTBOX 9.2. WHY EYE-TRACKING SPEED MATTERS

• Eye-tracking speed influences recording accuracy because it constrains
the type of algorithm the eye tracker uses (i.e., velocity-based or
dispersion-based).
1. Velocity-based algorithms
• often used in high-speed eye trackers (i.e., minimum sampling
speed of 250 Hz)
• can detect both saccades and fixations
2. Dispersion-based algorithms
• used in lower-speed eye trackers
• can only determine fixations
• Eye-tracking speed influences temporal sampling error and thereby the

accuracy of the data.
• Temporal sampling error is the discrepancy between when the
beginning and end of a saccade or fixation were recorded and
when the saccade or fixation actually occurred.
• The use of a faster eye tracker or collection of additional data can
average out the error.
To sum up, Textbox 9.2 lists the main reasons that speed matters. Eye-tracking
speed is important—it is the property that eye-tracking manufacturers advertise
the most about their equipment (Andersson et al., 2010; Holmqvist et al., 2011;
Wang, Mulvey, Pelz, & Holmqvist, 2017) and it influences the temporal accuracy
and precision of measurement. Nevertheless, other properties that are perhaps less
well advertised also matter a great deal for data quality. These properties include
spatial precision and accuracy and will be discussed next.
Accuracy and precision are two major indices of data quality.2 When research-
ers measure a participant’s point of gaze with an eye tracker, they hope (and often
assume) that the measurement is accurate and precise, so they, as a researcher, can
be confident in the data and the results. Although the terms accuracy and precision
are often used interchangeably in everyday life, within the field of eye tracking and
other areas of measurement the two mean something different. Accuracy (or off-
set) refers to the difference between a participant’s true eye gaze position and the
eye gaze position as measured by the eye tracker. It follows from this definition
that in order to assess measurement accuracy, we must know where a participant
is really looking in addition to where the eye tracker registers the participant’s eye
gaze. One way to do this is to ask your participant; however, in practice, researchers
and manufacturers usually instruct participants (or robotic stand-ins in the form of
artificial eyes) to look at very simple visual targets on the screen such as calibration
points (i.e., dots in different corners of the screen). Precision, which is better known
as reliability in our field, refers to how consistently the same (steady) eye fixation
can be measured in one and the same position. Precision does not take into account
how far the observed data samples are from the true fixation location; in other words,
precision and accuracy are independent of each other. As shown in Figure 9.19b, it is
possible to have an instrument that is precise but inaccurate. By the same token, we
can have accurate but imprecise measures (see Figure 9.19a). This is the case when
the center of a fixation aligns closely with the visual target, but the individual data
samples are dispersed around the fixation center.The ideal scenario is to obtain data
that are both accurate and precise (see Figure 9.19c), because this will lead to the
correct identification of fixations and saccades and will safeguard the internal validity
of the study (see Figure 9.19c). Accurate and precise eye-movement data, then, are
foundational to any data analyses that researchers perform in their study.
FIGURE 9.19 Precision and accuracy of an eye-fixation measurement. Dots indicate

the gaze location as sampled by the eye tracker and the crosshair
represents the true gaze point.
Although both accuracy and precision are important, they impact a research
study differently. The precision of measurement affects event detection; that is,
the parsing of the eye-movement recording—a huge collection of raw data sam-
ples—into a sequence of fixations and saccades (e.g., Figure 9.17). As discussed
previously in this section, an eye tracker records a very large number of individual
data points (x, y coordinates), which serve as input for an event detection algo-
rithm. If the data points are scattered because of low precision, any calculations
of the algorithm will be off and the resulting fixations and saccades may not be
accurately defined either. For example, a dispersion-based algorithm may identify
a single eye fixation as two shorter fixations because some data samples fell outside
a certain radius. A lack of precision, therefore, can be detrimental to the valid-
ity of your findings because it can change the dependent variables (fixations and
saccades) in the analysis. Conversely, the accuracy of an eye tracker is a concern
when interest areas are small, such as in reading research (Nyström, Andersson,
Holmqvist, & Van De Weijer, 2013). An example is work on different scripts,
where researchers have begun to chart the eyes’ landing position (optimal view-
ing position and preferred viewing location, see Section 2.4) in non-alphabetic
languages (e.g., for Chinese: Li, Liu, & Rayner, 2011;Tsai & McConkie, 2003;Yan,
Kliegl, Richter, Nuthmann, & Shu, 2010; Yang & McConkie, 1999; Zang, Liang,
Bai, Yan, & Liversedge, 2013). Such analyses require that words be sliced into
smaller units for analysis, such as the first and second half of a Chinese character or
the individual characters in a Japanese word.To ensure valid conclusions, accuracy
of measurement is crucial, especially along the horizontal dimension, because any
offsets will qualitatively alter researchers’ conclusions about where the eyes land
when reading text. Interestingly, eye-tracker accuracy also became a focus in the
debate on parafoveal-on-foveal effects, which centers on whether the properties
of the word to the right of fixation (word n + 1) can influence processing of the
currently fixated word (see Section 2.6). Because an offset of a few letter spaces
can make the difference between a fixation on word n and a fixation on word n
+ 1, Kliegl, Nuthmann, and Engbert (2006) included only those observations in
their analyses for which the binocular eye-movement recording assigned fixations
from both eyes to the same word. Accordingly, the authors removed 23% of cases
where the two eyes landed on different words (i.e., cases of binocular disparity).
In so doing, they hoped to preempt the criticism that “any erroneous assignment
of fixations to neighboring words due to limits of spatial resolution of the eye
tracker … could generate parafoveal-on-foveal effects” (p. 18). Because parafo-
veal-on-foveal effects are important to understand how attention is allocated dur-
ing reading (see Section 2.6), this example shows how seemingly technical details
such as the eye tracker’s spatial accuracy can have far-reaching empirical and
theoretical consequences (also see Rayner, Pollatsek, Drieghe, Slattery, & Reichle,
2007; Rayner, Warren, Juhasz, & Liversedge, 2004).
In text studies such as the two preceding examples, accuracy tends to be a
greater concern along the horizontal axis. This is because the text line or lines on
the screen make it easier to spot and correct systematic vertical offsets in the data
(see Section 8.1), but similar reference points for the horizontal dimension are
missing. In other areas of research, external referents with which to align the eye
gaze data may be less clear and so using larger interest areas may be a safer option
if accuracy is a concern. In either case, it is a good idea to check beforehand
whether your eye-tracking software enables you to make manual adjustments
to the recording after the data have been collected. Always bear in mind that by
cleaning the data, you are essentially changing the original recording and so great
care is needed (see Section 8.1, for further details on data cleaning).
A myriad of factors has the potential to influence data quality. Among them,
it is possible to distinguish factors that are specific to the eye-tracking hardware
and software, factors that relate to individual participant characteristics, and envi-
ronmental factors (Holmqvist et al., 2011; Nyström et al., 2013;Wang et al., 2017).
Holmqvist et al. (2011) tested the precision of 20 different eye trackers from
several manufacturers. They reported large effects of the eye camera on preci-
sion: overall camera quality and camera resolution (i.e., number of pixels used
to capture the eye image), as well as the position of the eye in the camera image
(something the researcher can adjust). Similar findings emerged from Nyström
et al.’s (2013) study, which focused specifically on the accuracy and precision of
one high-end, tower-mount eye tracker, the SMI HiSpeed 500 Hz. Nyström
and colleagues concluded that “data quality is directly related to the quality of
the eye image and to how robustly features can be extracted from it” (p. 285).
Any factors that interfere with the eye image, such as glasses, contact lenses, pupil
diameter, eye color, and downward pointing eyelashes can be detrimental to the
accuracy and/or precision of a recording (Holmqvist et al., 2011; Nyström et al.,
2013). More generally, it is important to keep in mind that the estimates provided
in eye tracker manuals and on manufacturers’ websites represent the upper end
of what accuracy and precision can be expected with a given device. These fig-
ures are obtained under optimal recording conditions, often using artificial eyes
rather than human participants and very short and simple visual tasks (Wang et al.,
2017). In an effort to promote independent comparisons of different eye tracker
models, the Eye Movements’ Researchers Association (EMRA) and the academic
network COGAIN launched the Eye Data Quality Standardization Project in
2012 (COGAIN, n.d.; Eye Movements Researchers’ Association, 2012). The goal
of this large-scale project is to obtain independent evidence for the accuracy and
precision of a large number of commercially available eye trackers. In a recent
comparison of 12 eye trackers that was conducted as a part of this initiative, Wang
et al. (2017) found that the Dual-Purkinje imaging eye tracker, the EyeLink 1000,
and the SMI HS240 yielded the most precise data with human eyes. As more
research findings are published (Holmqvist & Zemblys, 2016; Holmqvist et al.,
2015), researchers will be able to make informed decisions on what eye trackers
to purchase based on independent and unbiased quality metrics.
9.2 The Eye-Tracking Lab

9.2.1 Practical Considerations
Aspiring eye-tracking researchers may be pleased to know it is not necessary to
invest in a new eye tracker to run their first studies. Some manufacturers rent out
equipment for short periods of time at a much lower cost than a new eye tracker.
This alternative may be more viable to your department chair, dean, or funding
agency, who may be reluctant to make a large investment without demonstrated
prior eye-tracking experience. Thus, renting equipment can provide researchers
with the desired experience and skillset prior to setting up their own lab. An
alternative to renting is to volunteer one’s time at an existing eye-tracking lab.
In this case, the researcher will most likely join ongoing projects, rather than
start his or her own, but the experience will produce hands-on knowledge and a
good deal of practical knowhow. Again, such initial experience will strengthen the
researcher’s grant application or put them in a stronger position when negotiating
a start-up package for a new job. It can also help them decide how well the eye-
tracker they worked with meets their research needs.
Once an informed choice has been made, the next step is to locate a manu-
facturer or provider. A good strategy is to request updated quotes directly from
the manufacturing company or provider as prices tend to vary according to geo-
graphical location and may differ over time. In some countries, major manufac-
turers may work with third-party providers or have no representatives. This will
likely have implications for pricing and delivery of the equipment to your lab.
Also, technical support is an important consideration when comparing eye track-
ers from different companies. Most providers will offer installation and support
services, but the details of those services tend to vary in terms of their quality,
lifespan, and cost. For example, some companies offer free installation and on-site
training, whereas others will charge for training. Yet other companies offer free
training at their facilities, in which case the researcher only needs to cover his or
her own travel expenses. It is important that training sessions be tailored to the
specific kind of research for which the equipment is being purchased; that is, it
should be at the right level of specificity and should include hands-on experience
with programming and data collection. Consider contacting the training team in
advance to request personalized sessions (i.e., based on research topics, specific
research designs, kind of data, etc.).
As with any major purchase, it is recommended to check the duration of the
product warranty, which will vary greatly between eye trackers. Although computer
hardware will likely require updates once every four or five years, the life of the eye
tracker could (but will not necessarily) exceed this timespan. Aspiring researchers
are advised to ask existing users about any repairs or technical expenses they have
made for their eye trackers. It is worth noting that customer support and responsive-
ness beyond initial training sessions vary across companies. Some third-party provid-
ers will terminate their involvement after installation and/or training sessions except
for warranty services. Other companies offer lifelong technical support, which is
either included or contingent upon having a current software license. If a special
software license is required, the licensing fee, which may be several thousand dollars,
needs to be added to the total sales price as a recurrent cost. In terms of the level
and depth of support, there is again considerable variation. Support can range from
helpline operators answering general questions about the equipment to staff solv-
ing the specific problems that you as a researcher encountered in your experiment.
In short, it is preferable to inquire about the eye tracker’s life expectancy and the
company’s support line before making a purchase. Satisfied customers are the best
publicity; hence, novel eye-tracking researchers are encouraged to get in touch with
existing users to learn about their experiences.
Another consideration when buying an eye tracker is transportability.
Transportability is an attractive feature of eye trackers, because it opens up many
possibilities for data collection. By and large, video-based eye trackers seem to fall
into one of three categories, depending on how the eye tracker is set up. Many
eye trackers come with two computers—a participant PC and a host PC—which
make transportation more cumbersome. Laptop PCs are easier to transport than
desktop PCs, but even so, mounting the two computers, and potentially a third
stand-alone unit with the eye-tracking camera, will take time.Therefore, it mostly
makes sense to transport a two-computer set-up if one is setting up a temporary
lab in a new location, rather than using the eye tracker for a one-time data collec-
tion. Eye-tracking equipment of this kind is transported in special hard shell cases,
which are available for purchase from the manufacturer. When traveling, check if
the warranty covers (foreign) travel or special insurance is needed.
In contrast, eye-tracking cameras that attach to a laptop or are mounted on
a tripod can be transported with minimal effort, including in a purse or hand
luggage. These are the third and most transportable types of eye trackers, but they
are often also the slowest and least accurate and precise equipment (Holmqvist
et al., 2011). Thus, when considering what eye tracker to purchase, the reader
needs to contemplate both the type of research to be conducted as well as the
level of detail required from the analyses.The smaller the area to be analyzed (e.g.,
an interlocutor’s face vs. their mouth or eyes; a text paragraph vs. a single word;
an image or an object in the image), the more precise and accurate the equip-
ment needs to be. By the same token, some research designs (e.g., studies that
use gaze-contingent display changes [see Section 2.3] and studies that focus on
saccadic properties) require fast eye trackers for saccades to be detected and the
experiment to work. On the other hand, users of mobile eye tracking may trade
in some precision and accuracy for greater flexibility. Mobile eye trackers have the
advantage that data can be collected outside the university, for example in schools
or hospitals, in people’s homes, and even in communities at home or abroad that
do not live close by to a university campus or research facility. In many cases,
researchers will be able to reach large groups of participants who are currently
underrepresented in the research literature and who are often more heterogene-
ous than university students.
As a recent example of in-situ research, Indrarathne and Kormos (2017, 2018)
collected data from EFL students in Sri Lanka using a 60 Hz portable eye tracker
(see Figure 9.20) which the first author carried with her on the plane (Indrarathne,
personal communication, May 19, 2016). Lew-Williams (in preparation) is col-
lecting eye-movement data in people’s homes in Chicago, using a laptop and a
regular video camera mounted on a small tripod (see Figure 9.21). The data are
hand coded afterwards, frame by frame, using custom software (also see Godfroid
& Spino, 2015). McDonough, Crowther, Kielstra, and Trofimovich (2015) tracked
interlocutors’ eye gaze as they were completing communicative tasks, using a
combination of four eye-tracking cameras (two for each speaker, one for each eye)
and two webcams that served as scene cameras (see Figure 9.22).
FIGURE 9.20 Collecting data outside the lab: Sri Lanka.

(Source: Image supplied by Dr. Bimali Indrarathne, University of York)
FIGURE 9.21 In-home eye-tracking project in Chicago, IL.

(Source: Image supplied by Dr. Casey Lew-Williams, Princeton University).
FIGURE 9.22 Camera set-up to study eye gaze during oral interaction.

(Source: Image supplied by Dr. Dustin Crowther, University of Hawai’i).
On-site eye-tracking research may present its own set of challenges, but with
careful planning and familiarization with the research context, it can also be very
fruitful. First, a quiet, distraction-free space with control over temperature and
lighting is desirable (also see Section 9.2.2) and may require negotiation with local
contacts. From a data quality standpoint, it is preferable to collect all the data in
the same space, rather than move the eye tracker around, because this will ensure
similar recording conditions for each participant. Even so, there may be cases
where researchers need to settle for less than ideal solutions. A second considera-
tion is that on-site technical support may be unavailable or limited to general
IT questions and that the internet may not be as reliable. Thus, researchers must
have a plan for when they hit a technical roadblock and be prepared to handle
unexpected events. As a case in point, researchers may experience power cuts or
electricity surges in rural or isolated areas, which necessitate the use of a power
generator and electricity guard. By familiarizing themselves with the context of
their projects, researchers can prepare for success and contribute data from more
diverse populations to the field of SLA.
9.2.2 Spatial and Technical Requirements for a Lab

Finding a place to set up a lab in an existing facility could be a bit of a challenge
depending on the situation at your current institution. Big storage rooms or even
basements could offer a good alternative if no regular research or office space is
available. Such rooms are usually sound and light insulated. They are also cheaper
to remodel than it would be to build a new lab. If a large space is available, a rela-
tively affordable solution for creating lab-sized units is to install modular walk-in
chambers. Walk-in chambers are sound-attenuated and can be installed basically
anywhere. They divide a large space into smaller units so that researchers can set
up their equipment in one of the smaller spaces. Because this system utilizes pan-
elized rooms, these chambers can also easily be expanded or rearranged according
to the research needs of different projects.
Generally speaking, an eye-tracking lab will require a physical space that can
comfortably accommodate the eye-tracking equipment, the researcher(s), and the
participant. There are only a handful of technical requirements that apply to all
eye-tracking labs; these will be discussed in the following paragraphs. For further,
model-specific details, readers should consult their manufacturer’s eye-tracker
manual. As a rule of thumb, the lab should be spacious enough to comfortably
accommodate two tables, one for the display PC and the eye tracker, and the
other for the host PC (see Figure 9.23a). A two-PC configuration is common to
most manufactured (but not homemade) eye trackers: compare Figures 9.21 and
9.24. The host PC holds the eye-tracker software to run the experiment and, in
some set-ups, record the data; it is where the researcher controls the flow of the
experiment. The participant sits in front of the display PC and completes the
experiment there. Thus, the display PC is used to present the experimental mate-
rials to the participant. Although participants in most language processing studies
view stimuli on a computer screen (i.e., the display PC), this is not a requirement.
Figure 9.22 shows an example of eye tracking in face-to-face interaction research.
In this case, a video camera is placed above the participant’s head to capture the
visual scene (also see Section 4.2.4).
With a two-PC configuration, researchers may find it convenient to place
both PCs close to each other so they have easy access to the participant during
FIGURE 9.23 (a) (left) A two-PC configuration with Host PC and Display PC. (b)
(right) Two-room eye-tracking lab.
(Source: Modified after EyeLink II specifications).
FIGURE 9.24 Display PC and host PC separated by a panel. The participant is sitting

on the right and the researcher is sitting on the left.
camera set-up. An angular set-up, as seen in Figure 9.23a, is probably ideal for this
purpose. However, many times the host PC and display PC will be set up against
the same wall (i.e., aligned), as seen in Figure 9.24. In our experience, this can be
distracting for participants, because they may be tempted to look at the host PC
and thus disengage with their own task. As a simple solution to this problem, we
have placed a panel in between the two PCs in one of our labs (see Figure 9.24).
Another possibility is to set up the two PCs in two adjacent rooms, preferably
with a one-way mirror window in between, so the researcher can monitor the
experiment at all times (see Figure 9.23b).
Desk-mounted eye trackers should be placed on a sturdy table to avoid any
kind of vibrations. Rather than placing response devices such as button boxes,
keyboards, or mouses on the table, participants should hold them in their hand or
on their lap, to avoid vibrations in the eye-movement record. No other electronic
devices such as cell phones, fans, or radios should be placed or operated near the
eye tracker; in other words, it is best for the table to be completely clear.The table
should be deep enough to allow for the recommended distance from the person’s
eyes to the eye tracker. In our two labs, that distance is 20–22 inches (50–55 cm)
for one eye tracker and 27 inches (65 cm) for the other. Check your manual
for the measures that apply to your eye tracker. The thickness of the tabletop
is another consideration if your eye-tracking system comes with head support.
Users should confirm that their table is thick enough, but not too thick, to mount
a chin- and/or forehead rest. Another crucial piece of furniture is a height-adjust-
able, stationary chair (no casters). A stationary chair will help participants remain
in position and within range of the eye-tracking cameras as they are engaging in
their task. This is especially important when working with patients or children,
with whom researchers typically do not use a chinrest or other form of head sta-
bilization. Height-adjustable, stationary chairs are not as easy to find as one might
expect and usually need to be custom made. Make sure the chair is comfortable,
because participants may be in the same position for an extended period of time.
The ideal eye-tracking lab will not have windows so the researcher can control
the lighting and viewing conditions. Incoming daylight can reflect on the com-
puter screen and cause changes in a participant’s pupil size, both of which can be
detrimental to data quality. If a windowless lab is not feasible, a few windows are
fine as long as one avoids direct sunlight near the eye tracker.This will mean using
shades and not placing the eye tracker in front of a window. Make sure to lower
the shades in your lab consistently, for each data collection, so similar lighting
conditions apply to all the sessions. The idea is to create the same lighting condi-
tions for all participants; therefore, proper illumination inside the lab is important.
Holmqvist et al. (2011) recommend using fluorescent lighting, such as neon lights,
because they produce less infrared light and do not vibrate as much as incandes-
cent light bulbs. Halogen lamps are the least recommended. A light wall paint
color will further help to make the most of the ambient light in the room. Finally,
any existing lighting timers should be deactivated, so as to prevent the light from
going out automatically during longer data collection sessions.
To produce quality data, participants need to be able to perform the experi-
ment in a quiet space free from environmental distractions. Sound-attenuated
rooms are ideal for this purpose. Another strategy is to put up Quiet Please signs
on the lab door and in the hallway to reduce noise levels and prevent people from
entering the lab when the experiment is in progress.
9.2.3 Managing an Eye-Tracking Lab

Eye-tracking laboratories at universities and research centers (as opposed to com-
mercial labs) are usually under the direction of one or two faculty members who
oversee the eye-tracking research in their group, program, or department. The

lab director oversees the daily activities in the lab and is (co-)responsible for the
equipment, which is either their own or the property of the university. They also
oversee the budget. An important task of the lab director is to seek funding for
new equipment or for repair and maintenance of the existing one. Some funding
is necessary because computer hardware for the eye tracker (e.g., host PC or lap-
top, computer monitor) will have the same life cycle as that of any other desktop
or laptop computer. Depending on the eye-tracker model, certain parts might also
need to be replaced after a few years (e.g., lenses, bulbs) and software licenses may
need to be renewed annually.
Lab directors are also in charge of representing the eye-tracking lab among
the academic community and the public at large. In today’s interconnected world,
a website can help the lab members reach a worldwide audience and create an
online presence. To let web users share in the day-to-day successes of the lab, the
lab director can share lab events and publications on social media such as Twitter
or Facebook. Attending conferences, organizing colloquia, and giving workshops,
either within the university or externally, are other ways to make the lab known and
increase the visibility of the work and its members. At Michigan State University,
some researchers recently created an MSU Eye Tracking Research Group, the goal
of which is to bring together researchers from across campus who all conduct eye-
movement research, and to create a framework for networking and collaboration.
Chances are other researchers at your university do eye-tracking research as well
and you could reach out to them to form similar partnerships. Another thing you
could do is to organize in-house training sessions to attract researchers and get them
interested in working in the lab, if that is one of the lab’s goals. As a lab director, you
will seek out opportunities and liaise with other people who can help you build the
lab, not only as a physical space, but as a social and dynamic research environment.
Additionally, a lab is usually managed by a technician (hired as a regular
employee) or a graduate student (for whom the assignment is a part of his or
her training or job appointment). These two types of lab managers tend to
have different profiles; it is a good idea to consider the respective advantages and
disadvantages of each. Technicians provide lab members with access to expert
technical knowledge and help in programming and running an experiment.They
are in charge of scheduling data collection sessions and collecting large quanti-
ties of data, and can solve last-minute problems with the equipment or partici-
pants. Depending on their expertise, technicians can also assist in preprocessing
and organizing the data (e.g., extracting the data from the eye-tracking software
and doing preliminary quality checks, organizing the data in the spreadsheets)
and data analysis itself. It is also the lab manager’s responsibility to maintain the
eye-tracking equipment and ensure computers are up to date. When technical
problems do arise, they usually handle the communication with manufacturers,
because they understand the technical issues. Lab managers are often paid from
soft money, which entails a certain level of job insecurity. Because funding is
often limited, lab managers tend to make less money than they could in industry
positions. Therefore, retention can be a problem. What makes the job attractive,
then, is the opportunity to participate actively in research without carrying the
responsibilities that come with having an academic career.
In many programs, including our own, a graduate student takes on the role
of lab manager. While students need to go through a learning curve to master
the different aspects of programming and data collection, they tend to be highly
motivated and regard the opportunity as a gateway to a career as an eye-tracking
researcher. As a lab manager, students gain hands-on experience with all stages
of the research process—from planning and designing a study to publishing the
results—which looks great on their CV and prepares them for a future career in
the academy. Student lab managers are also a valuable resource for their fellow
students, because they can train others to conduct their own experiments or
participate in existing projects.This becomes especially important as the lab man-
ager prepare to graduate and leave the program, as passing on their accumulated
knowledge and expertise is essential to maintaining a healthy eye-tracking lab.
The lab manager position ultimately depends on the funding available at the
institution and the size of the lab. For small labs, having a full-time technician may
be challenging due to the small amount of work that he or she may have over
some periods when eye-tracking experiments are not in progress. In comparison,
a full-time lab manager position may offer more stability in the long term, assum-
ing there is research funding or financial support from within the university.
9.3 Getting Started
9.3.1 Ideas for Research
Sentence-Processing Experiment
Goal: To run a simple sentence-processing experiment with comprehension

questions
Sample study: Lim, J. H., & Christianson, K. (2015). Second language sen-
sitivity to agreement errors: Evidence from eye movements during compre-
hension and translation. Applied Psycholinguistics, 36(6), 1283–1315.
Many sentence processing researchers display comprehension questions after

every trial or subset of trials to check participants’ comprehension level of the
stimuli (see Figure 9.25). Researchers’ primary interest is typically not in how
accurately participants respond to the comprehension questions; rather, these
questions are included to ensure that participants stay on task and read the
FIGURE 9.25 An experimental item (top-left) followed by a comprehension question

(bottom-right).
(Source: Lim and Christianson, 2015)
sentences for comprehension (see Section 5.4). The researcher may also decide
to use the comprehension questions as a criterion for participant inclusion in the
data analysis. For example, Lim and Christianson (2015) excluded four partici-
pants who answered less than 85% of all comprehension questions correctly from
their data analysis.
For a sample experiment with the EyeLink 1000, refer to “TextLine with
Comprehension Questions” in the Experiment Builder usage discussion forum
(https://www.srsupport.com/forums/forumdisplay.php?f=7).
9.3.1.2 Research Idea 2: Entry-Level: Create a Text Reading Study
Goal: To examine eye-movement patterns in reading longer texts
Sample study: Godfroid, A., Boers, F., & Housen, A. (2013). An eye for words:
Gauging the role of attention in incidental L2 vocabulary acquisition by means
of eye- tracking. Studies in Second Language Acquisition, 35(3), 483–517.
Researchers who seek to study eye movements under more naturalistic conditions
may choose to present readers with longer stretches of connected text. Such text
could range from paragraphs (Balling, 2013; Bolger & Zapata, 2011; Godfroid,
Boers, & Housen, 2013) to short stories (Pellicer-Sánchez, 2016) to multiple book
chapters (Elgort, Brysbaert, Stevens, & Van Assche, 2018; Godfroid et al., 2018),
or even a whole novel (Cop, Drieghe, & Duyck, 2015; Cop, Keuleers, Drieghe, &
Duyck, 2015). An important consideration when designing a text reading study is
how to lay out the text on the screen; that is what font size and interline spacing
to use (see Figure 9.26, Section 6.2.1, and Figure 6.14).
Text-reading experiments are useful to compare reading patterns for different
groups, for example bilinguals and monolinguals, non-native and native speakers,
or L2 learners of different proficiency levels.This type of research design also lends
itself to studying whether readers can pick up new vocabulary incidentally from
reading longer texts. When the focus is on global reading patterns, sentence-level
analyses can be informative. Sentence-level measures differ from word-based meas-
ures in that they are computed based on all the words in the sentence. Thus, a
sentence-level analysis may involve sentence reading times and fixation counts for
the whole sentence, as well as aggregate eye-movement measures such as average
fixation duration, average saccade length, probability of regression, and probability
of word skipping (Cop, Keuleers, et al., 2015).When the focus is more local, includ-
ing an analysis of individual target words, a word-level analysis makes more sense.
FIGURE 9.26 One paragraph and comprehension question.

In this case, the researcher extracts eye-movement measures for predefined target
words only and ignores the other words in the sentence. The “big four” dependent
variables in word-level analyses are first fixation duration, gaze duration, regression
path duration, and total time (see Section 7.2.1.2), which can be supplemented
with other measures such as fixation count (see Section 7.2.1.1) and regression-in
(see Section 7.2.2). Researchers typically include both early and late eye-movement
measures to capture the time course of word processing (see Section 7.2.1.2.2).
For a sample template to run a natural reading experiment with the EyeLink
1000, refer to the “TextPage” template supplied by Experiment Builder (v.1.10)
under File -> Examples (https://www.sr-research.com/experiment-builder/).
9.3.1.3 Research Idea 3: Entry-Level: Study Script Effects in

Reading
Goal: To conduct a sentence processing study that compares reading of

different scripts
Sample study: Feng, G., Miller, K., Shu, H., & Zhang, H. (2009).
Orthography and the development of reading processes: An eye-movement
study of Chinese and English. Child Development, 80(3), 720–735.
One of the big questions in reading research is how perceptual (lower-level) and
cognitive (higher-level) factors interact during the reading process (see Sections
2.4 and 2.5). To examine lower-level factors in reading, researchers can compare
reading of languages with different scripts. Given two groups of fluent readers,
the role of higher-level comprehension processes will be minimized (both groups
will be able to read without comprehension difficulty); however, the visual input
will be different, making lower-level, orthographic factors the likely source of
any differences in reading patterns. To examine higher-level factors, research-
ers can compare groups with different reading skills reading in the same lan-
guage. This includes comparing children at different grade levels with adults (e.g.,
Blythe, Liversedge, Joseph, White, & Rayner, 2009; Häikiö, Bertram, Hyönä, &
Niemi, 2009; Rayner, 1986), comparing less and more proficient L2 readers (an
understudied research area), and comparing L2 adult with L1 adult readers (Cop,
Drieghe, & Duyck, 2015). The rationale behind these approaches is that child L1
readers and most L2 readers are less efficient at higher-level processing (e.g., word
recognition, sentence parsing) than fluent L1 readers, but if they are all reading the
same input in the same script, lower-level factors will be controlled for. As a result,
any differences between these groups’ eye-movement records are commonly
taken to reflect higher-level effects. Although most reading research focuses on
only dimension of the reading process (either lower- or higher-level processing),
here I present an ingenious study that looked at both factors combined, namely
Feng, Miller, Shu, and Zhang (2009).
Feng and colleagues examined the eye movements of English and Chinese chil-
dren and adults reading a mix of culture-specific and culturally unbiased texts in
their L1. Analyses of eye movements (fixations and saccades) revealed both higher-
level developmental and lower-level orthographic influences on reading. As for
developmental effects, it was found that adults processed text faster, with shorter
fixations, fewer refixations, and longer left-to-right eye movements, than chil-
dren. Consistent with their hypotheses, the authors also found orthographic effects
were more evident in younger than in older readers. Interestingly, English children
showed stronger orthographical effects than their age-matched Chinese peers, as
reflected primarily in saccade-related measures (English third- and fifth-graders
made shorter forward saccades than their Chinese peers). The authors concluded
that “larger developmental and orthographic differences are observed in saccade-
related measures than in fixation duration” (p. 732). What this means is that where
the eyes move, as seen in saccades, is more susceptible to external influences than
when the eyes move, as reflected in fixation durations (also see Sections 2.4 and 2.5).
Because children’s reading skills are not fully automatic yet, their reading behavior
offers a clearer window into the complex interplay of reading-related variables than
adult data (Feng et al., 2009).
Feng et al.’s study exemplifies the value of cross-linguistic reading research.
When comparing reading patterns across different languages, a few points deserve
special attention. First is the matching of reading materials.To level the playing field
between participant groups, it is necessary to make the reading materials for the dif-
ferent languages as similar as possible. Feng and his colleagues did this by using the
Chinese and English versions of two texts from a previous cross-cultural reading
study (Stevenson et al., 1990).These texts were assumedly free from cultural bias; to
ensure a familiar reading experience, they were supplemented with culture-specific
stories as well (see what follows). If parallel texts for different languages are unavail-
able, researchers may need to create their own translations. In that case, variables
that influence fixation duration need to be controlled between the two versions
of the text. These include average word frequency and, for languages that share the
same script, average word length (see Section 2.5). Cop, Drieghe, & Duyck (2015),
in a comparison of the English and Dutch versions of an Agatha Christie novel, also
matched sentences on “information density” (p. 9); that is, content word frequency
and the number of words, content words, and characters per sentence.
As stated previously, Feng et al. (2009) supplemented the two common texts
from Stevenson et al’s (1990) study with three culture-specific stories from US
and Chinese reading series, respectively. Third- and fifth-graders read different
stories (so a total of five stories each) and adults read all eight stories.When work-
ing with children, it is important that the topic and level of linguistic complexity
of the readings be age appropriate. To ensure this was the case, Feng et al. asked
teachers of each age group to evaluate the appropriateness of the materials for
their learners. Finally, when comparing different text versions, the display of text
also matters. Feng and his colleagues adjusted the font sizes in both languages
(24 × 24 pixel Song font in Chinese; 7.3 pixel average letter width in English)
to match the number of text lines in both languages. As a result, an average six-
letter word in English corresponded to 1.5 Chinese characters (Feng et al., 2009).
Linking the two writing systems in this way enabled the researchers to compare
eye-movement measures across both languages. For example, saccade length was
measured in pixels, not letters or characters, and for Chinese and English adults
this yielded closely overlapping distributions in saccade length.This supported the
authors’ decision to opt for a pixel-based measure. In short, working with different
languages requires a careful selection and visual presentation of materials, as well as
a healthy dose of cultural awareness. The reward is that it can improve our under-
standing of the universal and language-specific aspects of reading considerably.
9.3.1.4 Research Idea 4: Entry-Level: Create a Visual World Study
Goal: To run an action-based visual world experiment
Sample study: Morales, L., Paolieri, D., Dussias, P. E., Kroff, J. R. V., Gerfen,
C., & Bajo, M. T. (2016). The gender congruency effect during bilingual spo-
ken-word recognition. Bilingualism: Language and Cognition, 19(2), 294–310.
Visual world eye tracking is an eye-tracking paradigm used to study auditory

language processing in conjunction with visual input—real objects, pictures of
objects, or a visual scene (see Chapter 4).Visual world studies can be used to study
a variety of questions; here we focus on the most common use of visual word
eye tracking in L2 research, which is to test participants’ grammatical knowl-
edge. The idea is that participants can look at an object or a picture in the visual
display before it has been named if they can make use of grammatical cues that
precede the target (e.g., a gender-marked article preceding a noun). Such predic-
tive behavior is known as an anticipatory effect (see Section 4.1). Anticipatory
effects are taken as evidence that listeners possess a certain type of grammati-
cal knowledge (e.g., grammatical gender, number, definiteness, animacy, thematic
role, or semantic categories) and can make rapid use of the knowledge during
real-time listening. Thus, researchers look for anticipatory effects in listeners’ eye
movement records to test whether L2 learners, bilinguals, and monolinguals have
rapidly accessible grammatical knowledge and whether the L2 learners’ and bilin-
guals’ knowledge shows any cross-linguistic influences or transfer effects from the
language to which they currently are not listening (see Section 4.2.2).
Morales et al. (2016) tested the role of L1 grammar representations dur-
ing L2 processing with Italian-Spanish bilinguals. Italian and Spanish are two
typologically related languages; among their many similarities, both have a two-
gender system. Even so, translation equivalents (i.e., word pairs with the same
meaning in Italian and Spanish) do not always have the same grammatical gender:
compare “the cheese”, el(MASC) queso in Spanish and il(MASC) formaggio in Italian,
with “the monkey”, el(MASC) mono in Spanish but la(FEM) scimmia in Italian (see
Figure 9.27). Participants only listened to sentences in Spanish, the language they
acquired later in life. Italian was never spoken or mentioned. Furthermore, the
displays of critical trials always contained two pictures of which the referents
had the same grammatical gender in Spanish (i.e., two el nouns or two la nouns).
Therefore, Spanish grammatical gender was not an informative grammatical cue,
because it did not distinguish between the two pictures. However, in some trials,
the gender of the Spanish-Italian translation pair was incongruent, for instance
el(MASC) mono and la(FEM) scimmia, “the monkey” (see Figure 9.27, right panel).
Recall that participants never heard the Italian translations; even so, Morales and
colleagues found that the bilinguals automatically activated the Italian translations
and their genders during listening. Specifically, bilinguals looked less at the target
objects in gender-incongruent trials (where the Italian gender was causing com-
petition) than in gender-congruent trials (where gender cues in both languages
converged). Spanish monolinguals, who do not have any knowledge of the Italian
gender system, responded to both types of trials similarly.
Researchers who would like to replicate a visual world experiment could check
the IRIS database (https://www.iris-database.org) for pictures and audio stimuli
from published studies. At the time of writing this book, the materials from Trenkic,
Mirković, and Altmann (2014) and Andringa and Curcic (2015) could be down-
loaded directly from the database. These authors studied L2 learners’ knowledge
FIGURE 9.27
Example sentence pairs. The sentence always had the form Encuentra +
definite article el(MASC) or la(FEM) + target noun,“Find the [target noun]”.The
participant’s task was to click on the picture corresponding to the noun.
(Source: Morales et al., 2016).
of English definite articles and animacy, respectively. Another strategy is to request

the study materials from the author via email. Many authors would be happy to
share their materials if they still have them—and in fact, making materials publicly
available is a good way to ensure materials do not get lost. Getting in touch with an
author is also a way for beginning researchers to establish a professional relationship
and perhaps get valuable advice on their research projects. Finally, if you would like
to build your own experiment, SR Research has a sample template “VisualWorld_
Move” for use with the EyeLink 1000 on the Experiment Builder usage discussion
forum (https://www.srsupport.com/forums/forumdisplay.php?f=7).
When designing or replicating a study, it is important to think about the lan-
guage pairing. What are your participants’ L1 and L2? How is the grammatical
feature represented in each of their languages? The original visual world studies
looked at the processing of a L2 grammatical feature that does not exist in the
L1, for instance L1 Chinese speakers learning the article system in L2 English
(Trenkic et al., 2014) or L1 English speakers learning the classifier system in L2
Chinese (Lau & Grüter, 2015). In the exemplary study for this section, Morales
and colleagues (2016) focused on bilinguals whose two languages share the tar-
geted feature, yet with subtle (lexical) variations (also see Hopp & Lemmerth,
2018). A third possibility, which remains to be explored, is whether grammatical
distinctions that are irrelevant to the L2 still influence processing when they exist
in the L1 (see Cunnings, Fotiadou, & Tsimpli, 2017, for an example with overt/
null pronouns). In short, the first step in designing a visual world experiment is to
do a good old-fashioned contrastive analysis.
9.3.1.5 Research Idea 5: Intermediate: Replicate a

L1 Reading Study with L2 Readers
Goal: To examine whether L1 and L2 readers have similar individual

differences
Sample study: Hyönä, J., & Nurminen, A. M. (2006). Do adult readers

know how they read? Evidence from eye movement patterns and verbal
reports. British Journal of Psychology, 97(1), 31–50.
Although researchers often describe skilled L1 reading as if it were a uniform

process (see Section 2.2), a body of work suggests there are important individual
differences, even among educated adults reading in their L1 (e.g., Henderson
& Luke, 2014; Hyönä, Lorch, & Kaakinen, 2002; Hyönä & Nurminen, 2006;
Kuperman & Van Dyke, 2011; Taylor, & Perfetti, 2016). It stands to reason, then,
that L2 readers would show similar, if not larger, individual differences dur-
ing reading, even when they come from a relatively homogenous group of
similar-proficiency readers who speak the same L1. Individual differences will be
more pronounced in mixed-proficiency, mixed-L1 groups. Individual differences
research can inform models of eye-movement control (Henderson & Luke, 2014)
and L1/L2 reading development. Researchers can study differences in word-, sen-
tence-, and text-level processing, thus adding to our understanding of micro- and
macro-level processes in reading and how they vary between individuals (Hyönä
& Nurminen, 2006). In L2 reading changes within the same individual over time
(i.e., with increasing proficiency) would be another avenue to explore.
Individual differences research on eye movements in L2 reading is rare, but the
L1 reading literature has several good examples of what such a study could look
like. For example, Hyönä and Nurminen (2006) had 44 students of the university of
Turku, Finland, read a text in L1 Finnish on endangered species. The students’ task
was to summarize the text, akin to what English for Academic Purposes students
might be asked to do in their English courses. Hyönä and Nurminen performed a
cluster analysis of the eye-movement data to group their participants into clusters
of similarly reading individuals. Their cluster analysis revealed three reader groups,
who had also emerged in an earlier study (Hyönä et al., 2002). The three groups
were slow linear readers, fast linear readers, and topic structure processors. (Topic
structure processors were readers who looked back at the subtitles and topic sen-
tences of the different sections in the text.) Finally, the participants also rated their
own reading behavior in a post-task questionnaire, which enabled the researchers to
investigate to what extent readers were conscious of their own reading style.
Submitting participants’ eye-movement data to a cluster analysis is an inter-
esting application of this statistical technique that other individual differences
researchers could follow. As is the case with most advanced statistical techniques,
cluster analysis requires a fairly large sample size. The number of participants in
the study will determine what is feasible in the analysis. Larger samples (n > 100)
make it easier to identify small groups (i.e., clusters with few members; Hair,
Black, Babin, & Anderson, 2010). Conversely, smaller sample sizes, as in Hyönä
and Nurminen’s study, are better suited to identifying relatively large clusters.
Even when researchers do not cluster analyze their eye-tracking data, they may
still want to collect data from many people to ensure their tests have good statisti-
cal power (see Section 5.5). Recall that statistical power is the likelihood one can
detect an effect or association in the data if it exists. For example, to detect a small
to moderate correlation, which is typically what you find in individual differ-
ences research, a sample size of 85 is needed (Unsworth, personal communication,
August 9, 2016). Larger samples are also more likely to adequately represent the
wide range of individual differences that exist in the population. Therefore, study
findings based on a large sample are more likely to generalize to other studies and
research contexts. Finally, the choice of reading materials, in individual differences
research as in other reading studies, requires careful consideration. Some factors to
consider are text genre (e.g., novel, newspaper article, academic article, expository
text), content familiarity, text length, and purpose for reading (e.g., reading for
pleasure or to remember, as when asked to write a summary). How participants

read will likely differ depending on these factors. Therefore, researchers will want
to align the text(s) and reading tasks with the participants in their study, so the
type of reading participants engage in is representative of what they would do in
their everyday lives.
To build a paragraph- or text-reading experiment in SR Experiment Builder,
researchers can utilize the template introduced in research idea #2.
9.3.1.6 Research Idea 6: Intermediate: Replicate a L1 Visual

World Study with L2 Listeners or Bilinguals
Goal: To study whether L2 listeners and bilinguals can use verb tense to
predict what will come next in the sentence
Sample study: Altmann, G. T., & Kamide, Y. (2007). The real-time media-
tion of visual attention by language and world knowledge: Linking anticipa-
tory (and other) eye movements to linguistic processing. Journal of Memory
and Language, 57(4), 502–518.
When replicating a study, it is important to establish a good theoretical or meth-

odological rationale for replication. A conceptual replication is a study that repeats
another study in a modified form to examine how generalizable the results are to
different contexts such as with a different population or under different experi-
mental conditions (Polio & Gass, 1997; Porte, 2012). An example from the visual
world paradigm is Trenkic et al. (2014), who replicated the second experiment from
Chambers, Tanenhaus, Eberhard, Filip, and Carlson (2002) with a population of L2
speakers. Trenkic and colleagues found that L1 Chinese speakers of English, whose
native language lacks articles, were able to utilize the English articles the and an dur-
ing listening to determine the referent or goal of an action more rapidly. Specifically,
processing was fastest when the was used in combination with a unique referent in
the display and when a referred to an object that could not be uniquely identified. In
so doing, the researchers were able to replicate and extend the findings of Chambers
et al.’s original study with English native speakers and show that linguistic infor-
mation (the definite/indefinite article in the spoken sentence) and non-linguistic
information (the number of objects in the visual display) interact during listening.
Another candidate for replication from the L1 visual world literature is Altmann
and Kamide (2007), who also studied the interactions between the visual display
and the unfolding language on listeners’ eye behavior (see Section 4.1). Altmann
and Kamide’s study focused on verb tense. They were interested to know whether
the use of the future (e.g., will drink) or the past (e.g., has drunk) constrained what
object in the display participants would look at (e.g., an empty wine glass or a full
glass of beer) given the affordances, or conceptual information, associated with

those objects. Overall,Altmann and Kamide found that L1 English listeners do look
faster at the object that is compatible with the action denoted by the verb, although
the meaning of the present perfect is more indeterminate (i.e., the present perfect
does not imply that the action was completed unless the sentence explicitly states
so, The man has drunk all of the beer). The same recommendations apply to running
this study as to running an action-based visual world experiment (compare with
Morales et al., 2016, research idea #3).The primary difference is that Altmann and
Kamide used a look-and-listen task, rather than an action-based study (see Section
5.4). Look and listen means participants simply looked at a visual display while they
listened to spoken sentences, but did not click on any pictures (Morales et al., 2016;
Trenkic et al., 2014) or manipulate real objects (Chambers et al., 2002). Such a task
is arguably more ecologically valid than an experiment where participants need
to click on pictures (Dussias, Valdés Kroff, Guzzardo Tamargo, & Gerfen, 2013);
however, because there are individual differences in scene viewing, a look-and-listen
design (also called a passive listening or a free viewing experiment) tends to result
in more variable data quality (Altmann, 2011a; Dussias et al., 2013).
To start building your own visual world experiment in SR Experiment Builder,
you can use the same template as for research idea #4: https://www.sr-support.
com/showthread.php?342-Visual-World-Paradigm&p=23280#post23280
9.3.1.7 Research Idea 7: Intermediate: Conduct an Interaction

Study
Goal: To examine the relations between language learning and behavioral

characters during interaction (e.g., eye gaze) based on an existing study
Sample study: McDonough, K., Crowther, D., Kielstra, P., & Trofimovich, P.
(2015). Exploring the potential relationship between eye gaze and English L2
speakers’ responses to recasts. Second Language Research, 31(4), 563–575.
During conversation, we naturally examine non-linguistic features such as gesture

and facial expression. Recently, a group of scholars used eye trackers to investigate
the role of eye gaze during interactive activities and whether the eye gaze can sup-
port aspects of the L2 learning process—notably, responses to recast (McDonough
et al., 2015) and grammar learning (McDonough,Trofimovich, Dao, & Dion, 2017).
In McDonough et al. (2015), learners were engaged in four communicative activi-
ties (world record trivia, a story-telling task, an interview task, and a truth or lie
task), during which they received feedback from their L1 interlocutor. McDonough
and colleagues examined whether both speakers’ eye gaze and corrective feedback
features (e.g., prosody, intonation) are associated with target-like responses to recast.
In other words, if one speaker looks at the other speaker’s face during a feedback
episode or if both speakers look at each other, will this increase the chances that
the L2 learner makes a correct reformulation? The researchers found this to be the
case. Both L2 speaker eye gaze and mutual eye gaze with the interlocutor predicted
target-like responses to corrective feedback (see Section 4.2.4).
When examining eye-movement behavior during interaction, it is important
to maintain a natural and more ecologically valid setting where participants can
engage in activities without extensive control of movements. To do so, future
researchers should consider the following elements. First is the setup of the eye
trackers and scene trackers. One of the biggest differences between interaction
studies and reading- or listening-based studies is the visual display; that is, the
interlocutors will be looking at each other in interaction studies whereas read-
ing and listening studies normally take place in front of a computer screen. Such
naturalistic set-ups require the use of an additional camera, which is known as
a scene camera, to determine where in the field of vision a given participant is
looking. (In reading and listening studies, the computer screen is defined as the
visual scene.) Researchers can place the scene camera behind the interlocutors
(McDonough et al., 2015) or next to the eye trackers (McDonough et al., 2017).
Special care is needed to maintain the location of both scene cameras—one for
each interlocutor—and the distance between interlocutors and the eye tracker.
This can be done by fixing the chairs to a designated location to prevent partici-
pants from moving and also fixing the location of the scene camera. In addition,
care should be given to the selection of communicative activities. Given that
researchers have less control over the movements of participants on the spot (e.g.,
looking down), activities can be selected that would require less head movements
(e.g., looking at a poster rather than reading from a paper).
9.3.1.8 Research Idea 8: Advanced: Examine L2

Listening as a Multimodal Process
Goal: To examine the role of visual support during listening on test takers’
performance in an integrated writing task
Sample study: Cubilo, J., & Winke, P. (2013). Redefining the L2 listen-
ing construct within an integrated writing task: Considering the impacts of
visual-cue interpretation and note-taking. Language Assessment Quarterly,
10(4), 371–397.
Contemporary theorists of L2 listening comprehension challenge the view that

listening comprehension is solely dependent on listeners’ ability to decode audi-
tory input (Baltova, 1994; Rubin, 1995; Wagner, 2008). The idea is that when
people listen, they also rely on paralinguistic cues (e.g., gestures, facial expressions,
scenes) that can be encoded visually. As such, assessment researchers ask the ques-
tion of whether it is appropriate to include visuals (e.g., pictures or videos) in tests
that are designed to assess L2 listening ability.
Cubilo and Winke (2013) pursued this question in the context of an inte-
grated writing task; that is, “a task that requires test takers to read, listen, and
then write in response to what they have read and heard” (Educational Testing
Service, 2005). Cubilo and Winke varied the amount of visual support test takers
received for the listening portion of the test. In the audio/still-picture condition,
participants listened to a two or three-minute lecture while they saw a still picture
on the screen, whereas in the video condition, they viewed and listened to a video
recording of a different lecture. The authors found no differences on overall writ-
ing scores. (Scores on one component of the rubric, language use, however, were
significantly higher for video-based lectures.) Test takers took more notes when
they listened to audio-based lectures than when they viewed video-based lectures,
and the majority of test takers indicated on an exit questionnaire that the video
lecture was more helpful for content comprehension.
Cubilo and Winke suggested a follow-up study with eye tracking to obtain
more robust and detailed information about where and when test takers look
during video-based listening. Eye tracking could be used to quantify what per-
centage of the time test takers were looking at the screen (vs. looking down to
take notes or listening to the lecture with their eyes closed). Eye-movement data
might also help researchers understand what elements in the video cued test tak-
ers to take notes. One interesting possibility is that the lecturer’s paralinguistic
behavior (e.g., their gestures and facial expressions) and visual aids (e.g., graphs or
figures) add emphasis to the propositional content of the video and might there-
fore trigger the viewers to take more notes.
When replicating this study with eye tracking, the selection of video-based lec-
tures is important. Ideally, the researchers may want a video of a lecturer who dis-
plays a range of paralinguistic behaviors and a combination of content-related and
context-related elements in the video (see Suvorov, 2015). The researchers would
need to take into consideration that gestures may also be viewed and processed in
the parafovea and not looked at directly (see Section 2.1). Therefore, although it is
safe to state that test takers processed a speaker’s gestures when they looked at the
gestures directly, the opposite is not true.Viewers might still acquire some gestural
information from the corner of their eyes. To estimate the amount of parafoveal
viewing, researchers could calculate the size of a participant’s visual field based
on his or her seating distance from the screen (see Section 2.1). It is worth bear-
ing in mind that the quality of visual information degrades quickly away from
center vision, thus making the parafoveal processing of gestures increasingly less
likely. Alternatively, the focus of the study could be on gestures and eye-movement
behavior. In that case, the gestures themselves could be controlled to investigate
whether they trigger eye movements and how this might influence comprehension.
Lecture videos can be found on YouTube; on the Technology, Entertainment,

Design (TED) website; or on iTunes U, which contains intro-level university courses.
9.3.1.9 Research Idea 9: Advanced: Study Cognitive Processes

during Intentional Vocabulary Learnings3
Goal: To replicate a study with eye tracking to measure learners’ locus of

attention and/or viewing patterns during learning
Sample study: Lee, C. H., and Kalyuga, S. (2011). Effectiveness of different

pinyin presentation formats in learning Chinese characters: A cognitive load
perspective. Language Learning, 61(4), 1099–1118.
Learning to read and write in Chinese comes with the acquisition of a new script, a
large and time-consuming endeavor, and one of the reasons Chinese ranks among
the most difficult (Category IV) foreign languages in the United States (Defense
Language Institute Foreign Language Center, n.d.). College students at American
universities are typically expected to learn 3,000 characters in a four-year Chinese
program (Shen, 2014), although actual learning rates are lower and would seem
to call for more realistic learning goals (Shen, 2014). The learning burden for
Chinese characters is heavy because of the need to memorize many new forms
and because the pronunciation of most characters cannot be derived from the
written form (Liu, Wang, & Perfetti, 2007; Wang, Perfetti, & Liu, 2003). Both fac-
tors distinguish logographic from alphabetic writing systems. As a result, learners
of Chinese must acquire three interlinked constituents of character knowledge—
their shape (orthography), sound (phonology), and meaning (semantics) (Perfetti,
Liu, & Tan, 2005). Adding to the learning burden is that each constituent may
require focused training to be internalized (Guan, Liu, Chan,Ye, & Perfetti, 2011).
Lee and Kalyuga (2011) focused on how best to organize the three sources of
word information in a vocabulary learning task: the character, the phonetic trans-
literation known as pinyin, and the English translation.The authors found that dif-
ferent presentation formats affected the learning outcomes. Australian high school
students (mostly heritage learners) who saw the character, pinyin, and translation
displayed from top to bottom learned more words than those who received the
horizontal format, with the character, pinyin, and translation arranged from left
to right (see Figure 9.28). Lee and Kalyuga attributed the higher learning rates in
the vertical format to a reduction in extraneous load, a type of cognitive load
that is detrimental to learning (see van Merriënboer & Sweller, 2005 for a review
of Cognitive Load Theory). Specifically, in the vertical format the two syllables in
pinyin were placed directly below the corresponding characters (see Figure 9.28,
b), which enabled a direct comparison of each character and their pronunciation.
FIGURE 9.28 Three presentation formats of Chinese words, their pronunciation and

meaning.
(Source: Adapted from Lee & Kalyuga, 2011).
However, in the horizontal format, learners had to search and match the pinyin
syllables with the characters (see Figure 9.28, a), which the authors hypothesized
led to a split attention effect (Paas, Renkl, & Sweller, 2004).
Lee and Kalyuga (2011) measured the learners’ cognitive load on a nine-
point Likert scale. By adding an eye-tracking component to the study, research-
ers could obtain concurrent evidence of attention and verify the central claim
of the original article that the horizontal presentation format led to split atten-
tion. Saccadic transition patterns, which reflect eye movements between differ-
ent interest areas on the screen, would shed light on this issue (see Figure 9.29).
Furthermore, a new, adjacent format, where the characters are placed above the
FIGURE 9.29 An example of a transition matrix for wèizhi,“location”, with the Chinese
characters, pinyin, and English translation arranged vertically. Numbers
represent probabilities of where a learner’s eyes will move next.
(Source: Figure supplied by Xuehong (Stella) He, Michigan State University).
pinyin and to the left of the translation (see Figure 9.28c), might prove even
more efficient than the horizontal format (Lee & Kalyuga, 2011), because in this
case the pinyin will still be matched to the characters but now the translation
will also be close.
To adapt the original study to an eye-tracking experiment, I recommend
removing the spoken component from the learning session, so as not to bias
participants to look at the pinyin. The fixation point prior to each learning
trial should be fixed (e.g., on the first character of the word) so researchers can
compare transition patterns between the different elements on the screen (i.e.,
characters, pinyin, and translation) for the three conditions. It is also a good idea
to control the distance between these elements. One way to do this is to create
image files for each trial, which one could do in Adobe Acrobat or Microsoft
PowerPoint. Next, the researcher could create interest area templates and apply
or copy the templates to the different word trials. Finally, to ensure robust and
reliable measurement of word learning, it is a good idea to expand the vocabulary
posttest with more test items. For eye tracking as for other types of experimental
research, test reliability is a very important consideration (see Section 5.5).
9.3.1.10 Research Idea 10: Advanced: Conduct a Synchronous

Computer Mediated Communication (SCMC)
Study with Eye Tracking
Goal: To study writing partners’ lexical alignment during a collaborative

writing task
Sample study: Michel, M., & Smith, B. (2019). Measuring lexical alignment
during L2 chat interaction: An eye-tracking study. In S. Gass, P. Spinner, &
J. Behney (Eds.), Salience in Second Language Acquisition (pp. 244–268).
New York: Routledge.
As technology evolves, the ways in which people communicate in various lan-

guages across the globe have diversified. Of several means of communication, writ-
ten SCMC is a popular, contemporary tool, not only to converse with others, but
also potentially to promote language learning. The effects of SCMC in language
learning can be linked to the affordances SCMC offers. One such affordance in
text-based SCMC is the physical availability of the text; that is, learners can review
previous messages in a written chat and reflect upon what to write. The perma-
nence of texts on the chat screen (as opposed to the fleeting nature of spoken
language) has led Smith (2005) to suggest a facilitative role for SCMC in notic-
ing form and meaning, arguing that the written record increases the saliency of
the input. Building on this premise, Michel and Smith (2019) recently investigated
chat partners’ tendency to reproduce each other’s phrases, a notion called lexical
alignment, and to what extent heightened, overt attention is associated with this
phenomenon.
Michel and Smith investigated L2 learners’ alignment during interactive writ-
ing tasks that involved composing an academic abstract. The authors employed
two eye trackers (one in the UK and one in the US) to measure their participants’
attention to multi-word expressions in the chat log (e.g., the last part, oral cmc and ftf
[oral computer mediated communication and face to face], L2 vocabulary learning)
that they later reproduced in their writing. The authors examined whether such
instances involved higher levels of overt attention (as measured by phrase-length-
corrected total time and fixation counts) or whether lexical alignment happened
more implicitly or whether the repeating occurrences were perhaps a coincidence
(i.e., not a true case of lexical alignment).
Six participants studying at either a British or an American university were
paired up to reconstruct an academic abstract based on bullet-pointed informa-
tion. The participants interacted via written SCMC over six sessions while eye
trackers in the two countries measured their eye movements on the computer
screen. (Each participant was sitting in front of a different computer screen, which
yielded two distinct sets of eye-movement data.) Using the programming language
R, Michel and Smith identified all instances where both interlocutors produced
the same three- to ten-word unit (e.g., of the study). If these instances received at
least one fixation, they defined them as interest areas for the eye-movement analy-
sis (58 cases in an 8,759-word text log). Next, they compared eye movements for
these possible sources of lexical alignment with baseline data, which were turns in the
conversation that did not show any lexical overlap and were also viewed at least
once (135 turns in total). The authors found that 16 of the 58 possible sources
of lexical alignment were fixated on more and for longer than the baseline texts.
They termed these identified sources for lexical alignment. The authors concluded
that although chat partners do align their lexical productions with each other (as
shown by the 16 cases of lexical overlap + heightened attention), the amount of
conscious, strategic alignment may be lower than what was previously estimated.
Michel and Smith’s innovative use of eye-tracking technology in a chat environ-
ment can inform future researchers who seek to use eye tracking in dynamically
changing contexts such as a chat log or a video. In such environments, the most
challenging and time-consuming task is to draw interest areas around the moving
targets (also see Section 6.1.3.2).This is because the location of utterances in a chat
interface changes each time participants type in more text, which earned SCMC
the alternative definition of “Spontaneously Created and Moving Constantly”
(Michel & Smith, 2017, p. 461). Likewise, target objects in a video, including cap-
tions or subtitles, will move as the video plays and the story unfolds. Current prac-
tice is to manually adjust interest areas each time the target moves on the screen;
however, this is very time-consuming. For example, let’s imagine you identified
“of the study” to be a possible source of alignment between two interlocutors and
drew an interest area around this phrase. As soon as one of the chat partners entered
new text, the location of “of the study” would move up one line in the chat log. In
Michel and Smith’s study, there were about 30 turns per screen, divided roughly
equally between the two speakers (Michel, personal communication, November
10, 2016). As a result, there were about 15 changes in location before a turn dis-
appeared from the screen,4 hence about 14 manual moves of the interest area. To
do so, the researchers deactivated the original interest area around a target phrase
and drew a new interest area at the new location each time the target phrase
shifted. Coding one baseline conversation of approximately 30 turns in this way
was about two days’ worth of work (Michel, personal communication, November
10, 2016). In the future, eye-tracking research with moving interest areas will likely
become less labor intensive as new interest area detection programs such as EyeAnt
are developed (Anthony & Michel, 2016). Indeed, the eye-tracking manufacturer
SensoriMotoric Instruments already has a function in their data analysis software
that enables the automatic tracking of moving targets in videos.
9.3.2 Tips for Beginners

Although there is a wealth of information available on eye tracking, it may not
be that accessible to new researchers in the field. Research published in journal
articles usually contains detailed descriptions of experimental design, data analysis,
and results, but it often skips over some of the more technical details related to
running an eye-tracking experiment. At the same time, manufacturer manuals do
offer extensive practical and technical advice, but the sheer amount of informa-
tion can be overwhelming. In this final section of the book, I summarize the tips
and tricks that were provided across the different chapters in this book and top
them off with some additional insights.
9.3.2.1 About the Equipment

•• Before you purchase an eye tracker, talk with current eye-tracker users
about how satisfied they are with their eye tracker and the customer support.
Is technical support included or does it require an additional software sup-
port contract? How responsive are the manufacturers? In some countries, it
is common to purchase an eye tracker through a third-party provider. Make
sure you know what access you will have to customer support, and in what
language, before making a decision.
•• Know the equipment well. The excitement of getting your new equip-
ment installed can quickly turn into frustration if you find yourself running
your first study and have trouble handling the eye tracker. Be sure you know
where the eye camera, the mirror (if your eye tracker has one) and the infra-
red light source(s) are. Learn how to keep them clean and in a good condi-
tion.Your manual and the company representative can be of great help.
•• Keep an open channel with the equipment manufacturer. They can

answer any questions about the equipment and help with troubleshooting.
•• For remote (e.g., table-mounted) eye trackers, the distance from the participant
to the eye tracker is important. Find out what the optimal seating distance
is and mark it on the table. Buy a table that can comfortably fit the computer
monitor, the camera, and the extra length from the camera to the participant’s
eyes.
•• To make accurate recordings, it is essential that the camera be positioned at
the right angle relative to the eyes. Eye movements tend to be filmed from
below, at a sharp angle. When calibration does not succeed at first, changing
the angle of the video camera is a common remedial technique.
•• Practice makes perfect. Though perfection takes time, you will get to
know your equipment better the more you use it. It is a good idea to be
involved in the data collection process, at least at first, and depending on your
research context and personal preferences, you may always want to collect
your own data. Holmqvist et al. (2011) noted that “the researcher with the
highest incentive to get good data should take part in the recordings, so she
can influence the many choices made during eye camera set-up and calibra-
tion” (p. 110). Do not pass up on the opportunity to gain hands-on experi-
ence by participating in the data collection.
•• Training sessions are crucial. Do not skip training sessions or workshops,
because they cover some of the same information as in the manual, only in a
much more condensed form.Training can also provide you with tips and short-
cuts that are normally gained from experience or extensive practice. Whether
you are a lab assistant, a graduate student, or a faculty member, you will find
practical, hands-on training sessions to be a good investment of your time.
•• Consider buying an additional license for programming, recording, and data
analysis. Some companies offer a non-recording license for data programming
and analysis only, which may be more affordable than a full license. Having an
extra license will give you the flexibility to work away from the lab.
•• Have a protocol for your lab. The protocol is a detailed step-by-step guide to
using the eye-tracking lab that applies to all users. It includes information on
lighting, computers, and supplies as well as specific instructions on how to
handle the equipment. The general lab protocol is different from any study-
specific protocols that researchers may create for their own studies (see Section
9.2.3).
9.3.2.2 About Data Collection

9.3.2.2.1 Organizing the Data Collection and Logistics
•• Carry out a pilot study. Piloting your study is the best way to ensure your
experiment works the way you think and to anticipate any problems during
data collection. Run a pilot to check whether participants understand the

instructions, confirm that your study is giving you the data and information
you need, and rule out any technical glitches. Use the results of your pilot to
modify your experiment and repeat this procedure as many times as neces-
sary. For the best (and most realistic) results, test the experiment on partici-
pants with a similar profile as the participants you will use later.
•• Be organized (part one). With so many little things to think about, par-
ticipant recruitment can be messy. Once flyers have been put up and emails
sent, be ready to handle a large number of short, practical emails. Be prompt,
courteous, and keep calm.
•• Be organized (part two). Arrive five to ten minutes early for data collection
and make sure you bring all materials (e.g., consent forms, questionnaires, and
participant compensation) to the data collection session. It is a good idea to
keep separate, labeled folders or files for each project in the lab. Individual
participant records can be kept in a portable hanging file box, which is espe-
cially practical if file storage in the lab is not available or if using a mobile lab.
•• Have a clear and detailed study protocol. A good protocol should specify,
step by step, how things are supposed to happen: from meeting the partici-
pant, to setting up the experiment, going over instructions and collecting the
data, to participant debriefing, data storage and back-up. Do not leave things
to chance or common sense, but be as specific as you can! The goal of having
a detailed study protocol (in addition to the general lab protocol, see Section
9.2.3) is to replicate the exact same conditions in every data collection ses-
sion and have more control.
•• Be realistic. Data collection can be time-consuming and the time nec-
essary for camera set-up and calibration varies between participants.
Schedule enough time so there are no two participants in the lab at
once. Overlapping participants may sound like a good idea to save time and
human resources (i.e., lab assistant hours), but it can quickly make the data
collection process unmanageable. By interacting with one participant at a
time, you set yourself up for collecting quality data from a participant who
is fully devoted to their task.
•• Be prepared for the unexpected. It is ideal to have a colleague or assistant
stand-by during data collection so they can step in if things do not go as
planned. Having a back-up is particularly important when working with
clinical populations or children, because these populations may require extra
attention. With two researchers in the room, one person can be in charge
of the equipment while the other gives instructions and prepares the par-
ticipant. If you commonly collect data with a student or colleague, keep the
roles the same for the duration of the study: one person for camera set-up
and the other for interacting with the participant. As with single-researcher
data collection sessions, the idea is to replicate the same conditions across
all participants. If it is not possible to have two researchers collect the data
together, adding audio-recorded instructions to the experiment is another

way to increase internal consistency.
•• Ensure the best environment for your participants (part one). It is the right
thing to do. Happy participants also tend to be good participants and they
may tell their friends about your study, so that they too can become your
participants (Holmqvist et al., 2011). Snacks and refreshments are generally
appreciated, as is some small talk at the beginning of the session. Small talk
helps to build rapport and ensures participants clear their minds, so that your
study has their full and undivided attention. As in real life, not all participants
may feel like chatting. That is all right, as a researcher you can adjust to your
participant’s preferences.
•• Ensure the best environment for your participants (part two). The optimal
environment is not only friendly and relaxed but also quiet. This is so your
participant can focus on their task. If the lab is located in a busy area on
campus or in the building, noise can be a problem. Putting up Quiet Please –
Data Collection in Progress signs in the hallway is a good and cheap alternative
to sound-proofing the lab. It will also prevent people from coming into the
lab while you are recording data. This may be especially important if data are
recorded outside office hours and a police officer walks into the lab to check
on things, like once happened in our lab!
•• Train yourself to examine your participant’s eyes when he or she comes
into the lab. Does the participant wear glasses, contact lenses, or eye makeup?
What is their eye shape? Is the eyelid far from the pupil or does it risk cov-
ering the pupil when the participant looks down? And what about their
eyelashes or eyebrows, do they risk being in the way? Do you notice any
problems of vergence? By quickly assessing your participant’s eyes when he
or she walks in, you can anticipate potential problems during camera set-up
and calibration and save time.
•• Much like your fridge at home, your lab requires a certain number of sup-
plies to be fully functional. Some of these supplies include water, tissues,
makeup removal wipes, an eyelash curler, light-colored tape, and (we think)
lab candy.
•• Use an online calendar for participant scheduling. An online calendar is
useful in managing a large-scale data collection. It is indispensable when
several projects take place in the lab concurrently.When using a calendar, you
can either give access to researchers only or you can set it up so prospective
participants can view the times that are currently available in the lab.This can
spare you the back-and-forth emails to schedule sessions.
•• Always make a back-up of your data, preferably more than one. Go the
extra mile to secure your data, because they are the fruit of your labor. When
multiple people use the lab concurrently, there is always a possibility of
human error and data back-ups become even more important.
9.3.2.2.2 Camera Set-Up and Calibration

•• Keep a data collection logbook. Make notes about data recording qual-
ity for each participant, so you can refer to them during data analysis. First
write down the participant code or subject ID and experimental condition.
Then, note any comments about the participant’s reading behavior, such as
very careful reading or skipping segments.Were there any technical problems
with camera set-up and calibration? Any slides in data quality such as track
loss or drift? You will find this information extremely valuable when moving
forward to the next stages of the research process.
If your experiment has an element of deception (e.g., the presence [unbe-
knownst to the participant] of ungrammatical forms or pseudowords, a recur-
ring syntactic structure, or linguistic cues to target pictures), it is important to
verify at the end of data collection that the participant did not realize what
the experiment was about. To do so, conduct a short interview and ask your
participants what they thought the experiment was about. This information
will help you screen out any individuals who may have adapted their reading
or listening behavior in unintended ways.
•• Do a test of eye dominance. Two common tests are the Porta test (Porta,
1593, as cited in Wade, 1998) and the Miles test (Miles, 1930).To perform the
Porta test of eye dominance, extend one arm out in front of you with your
thumb pointed up. Align your thumb so that it covers a distant object (see
Figure 9.30). Continue looking at the object with your right eye closed, and
then trying looking at it with your left eye closed. You will notice that the
FIGURE 9.30 An example of the Porta test of eye dominance, where the thumb is
aligned with a stop sign.The person closes one eye at a time to see if the
sign “jumps” to the side or stays put.
object seems to “jump” to the side with one eye, but stays in place with the
other.The eye that can stay open while keeping the object aligned with your
thumb is your “dominant eye”.
Two thirds of the population are right-eye dominant (Eser, Durrie,
Schwendeman, & Stahl, 2008) but a sizable minority have a dominant
left eye (a small segment of the population have equally dominant eyes, a
quality highly sought after in sports such as archery where aim is crucial!).
Monocular recordings tend to be more accurate than binocular record-
ings, so if tracking of only one eye is an option on your machine, do that and
record the dominant eye only. If you change the eye being tracked (e.g., left
instead of right), you may need to adjust the camera position, so it captures a
good, central image of the eye. Monocular eye tracking is also a solution for
participants who have a lazy eye (amblyopia).
•• Camera set-up and calibration are key. If calibration fails, do not proceed
with data collection. Make some adjustments and recalibrate. Some partici-
pants are very eager to participate in your study and you may feel some pres-
sure to go ahead and collect data from them anyway, but the data will not
be usable. So take a deep breath, try making some of the adjustments listed
below and calibrate again.
•• Consider using the following tricks for improving calibration accuracy.
Many of these boil down to changing the angle at which the video cameras
are filming the eye. For starters, check your participant’s distance from the
screen. Make sure they are seated in front of the center of the screen and
adjust the seat height, if necessary, so the participant’s eyes align with the top
quarter of the screen. If you are recording head-free, ask participants to limit
their head movements. With head-stabilized systems, you can try adjusting
the height of the chinrest. Some chinrests also move forward and backward,
which adds more degrees of freedom to the set-up.
•• If calibration fails, it is possible that the eye tracker is mistaking something
else for the pupil. Common sources of confusion are eye makeup, down-
ward pointing eye lashes, and dark eyebrows. Some manufacturers provide
an image of the eye camera, including the areas that were detected as
the pupil and the corneal reflection. This will make it easier to diagnose the
problem. If you do not have access to the eye image from the camera, you can
still locate the problem by ruling out the following potential culprits:
•• Mascara or eyeliner? Remove with makeup remover wipes.
•• Long and/or downward pointing eyelashes? Keep an eyelash curler in
the lab.
•• Dark eyebrows? Define the search area for the eye so it excludes the
eyebrows (option available on some eye trackers only). Alternatively,
ask your participant if they would agree to cover their eyebrows with
white tape.
•• How to deal with glasses. Most companies will declare that glasses do not
represent a problem for calibration, but reality can be different. If you do
have trouble calibrating participants with glasses, check that the glasses did
not slide. Glasses should be up on the bridge of the nose and clean. Then ask
the participant to tilt their glasses slightly so that the infrared light from the
eye tracker hits the glasses at a different angle. This is to prevent glare (the
reflection of the light beam from the glasses, rather than the eye). Participants
who normally wear bifocal glasses should bring another pair to the lab, if they
have one, because bifocals cannot be calibrated. Dark glasses are also difficult
as the dark frames may look like pupils to the camera. Adjust the thresholds
for pupil detection manually, if your eye tracker has this function, or ask your
participant to wear their spare pair of glasses or contact lenses. Alternatively,
you may cover the frames with light tape, but only with the participant’s
permission of course.
•• Eyes differ in shape. Some shapes are easier to calibrate than others. If you
are recording eye movements from participants who have a single eyelid, as
is common in some parts of the world, calibration may be more challenging
because the eyelid may partially cover the pupil under certain angles, such as
when the participant looks down.The same can occur with participants who
have droopy eyelids (ptosis), which is found more often in older people.Your
best chance of getting a successful calibration is to change the angle at which
the camera is filming, for instance by moving the camera a bit more closely
so it films from below.
•• To use or not to use a chinrest. Stabilizing the head during an eye-move-
ment recording will typically result in higher-quality data but comes at the
cost of reduced ecological validity. My view is that it is best to use a chinrest
if your study participants are comfortable with it. If your eye tracker does not
include a chinrest, you can buy one secondhand on the internet or have it
custom made.
9.3.2.3 About Data Analysis

•• Clean your data. In L1 reading research, it is common practice to delete
very long fixations (> 800 ms or > 1,000 ms) and either merge very short
fixations (< 50 ms, < 80 ms, or < 100 ms) with adjacent fixations or delete
them before running any analyses. It is important to note that these guide-
lines have not yet been tested with L2 learners or, for the most part, with
language tasks other than reading. In Section 8.2.1, I demonstrated a pos-
sible approach for L2 and bilingualism researchers to determine whether the
available cut-off values from the L1 reading literature also generalize to their
research contexts.
•• Do not adjust your data unless you are confident of what the participant was
looking at (see Section 8.1.3). I recommend that researchers set and report
an a priori quality standard for their data recordings (% track loss) and
automatically discard any participant files that do not meet the standard. All
participant files that pass muster should additionally be screened for quality
on a trial by trial basis (see Section 8.1.2).
•• Do not delay eyeballing your data. With data collection still fresh in your
mind and your logbook by your side, it will be easier to detect any anomalies in
the recording, such as offsets, excessive artifacts, or track loss (see Section 8.1.2).
When faced with a noisy recording, your only safe option is to exclude the data
from further analysis. Data adjustments (i.e., moving the eye gaze data or the
interest areas manually) only makes sense when there is a clear external referent
(e.g., a line of text) with which the eye gaze data can be aligned (see Section
8.1.3). Even then, cleaning should be used sparingly and with care, because you
are essentially making changes to your original recording.
•• Defining interest areas. Whenever possible, interest areas should be defined
before data collection because this is more objective and saves work in the
long run (see Section 6.1). However, with some types of experiments (e.g.,
writing research, classroom-based or interaction research) you may not know
beforehand what the region of analysis will be. In either case, your interest
areas should be conceived as semantic or thematic units that relate directly to
your research questions. Do not be be tempted to change your interest areas a
posteriori because you believe it would be favorable to your results (Holmqvist
et al., 2011). When you draw your interest areas, make sure to include a
margin (e.g., whitespace around a picture or above and below a text line; use
double-spacing). This will ensure that eye-movement data with a slight offset
will still be assigned to the proper target region.
•• Be aware that dynamic (moving) interest areas (such as an interlocu-
tor’s face in natural discourse, characters in movies, websites, written chat
applications, and anything where participants can scroll on the screen) are
time-consuming to draw. To a large extent, dynamic interest areas are still
coded manually today, which is an extremely time-consuming and monoto-
nous process. This situation will likely improve as new interest area detection
programs such as EyeAnt are developed (Anthony & Michel, 2016). In the
meantime, it is good to weigh the costs and benefits of a study design with
dynamically changing interest areas (see Section 9.3.1, research idea #10). In
some cases, a simple fix such as disabling the scrolling function on a website
can save many tens of hours of manual coding work afterwards.
Lastly, the more experiential knowledge that one needs for conducting eye-
tracking research comes from practice. Experienced researchers know that man-
aging the bits and bobs of eye-tracking research takes time, and it may indeed be
a never-ending process as researchers discover new horizons that require innova-
tive solutions. I expect that innovation will only increase as growing numbers of
L2 and bilingualism researchers incorporate eye tracking into their research and
the community of eye-tracking researchers in the language sciences expands. We

learn from methodology books like this one and from each other.
Notes
1 There are separate algorithms for detecting optic artifacts such as blinks and rarer eye
behavior such as smooth pursuit, which are not included in this discussion.
2 In the discussion that follows, we refer to the spatial accuracy and precision of a meas-
urement. This supplements considerations about temporal accuracy and precision that
relate to when an event is detected in time versus when it occurred (see Figure 8.18).
3 This idea was submitted by Xuehong (Stella) He, a PhD candidate in Second Language
Studies at Michigan State University.
4 Michel and Smith only studied lexical alignment with the other interlocutor’s text, not
one’s own text.
REFERENCES
* denotes a study that was included in the synthetic review of eye-tracking

research in SLA and bilingualism.
Aaronson, D., & Scarborough, H. S. (1977). Performance theories for sentence coding:
Some quantitative models. Journal of Verbal Learning and Verbal Behavior, 16(3), 277–303.
doi:10.1016/S0022-5371(77)80052-2
Alanen, R. (1995). Input enhancement and rule presentation in second language acquisition.
In R. W. Schmidt (Ed.), Attention and awareness in foreign language learning (pp. 259–302).
Honolulu, HI: University of Hawai‘i Press.
Allopenna, P. D., Magnuson, J. S., & Tanenhaus, M. K. (1998). Tracking the time
course of spoken word recognition using eye movements: Evidence for continuous
mapping models. Journal of Memory and Language, 38(4), 419–439. doi:10.1006/
jmla.1997.2558
Alsadoon, R., & Heift, T. (2015). Textual input enhancement for vowel blindness: A study
with Arabic ESL learners. The Modern Language Journal, 99(1), 57–79. doi:10.1111/
modl.12188
Altmann, G. T. M. (2011a). Language can mediate eye movement control within
100 milliseconds, regardless of whether there is anything to move the eyes to. Acta
Psychologica, 137(2), 190–200. doi:10.1016/j.actpsy.2010.09.009
Altmann, G. T. M. (2011b). The mediation of eye movements by spoken language.
In S. P. Liversedge, I. Gilchrist, & S. Everling (Eds.), The Oxford handbook of eye
movements (pp. 979–1004). Oxford, UK: Oxford University Press. doi:10.1093/oxfor
dhb/9780199539789.013.0054
Altmann, G. T. M., & Kamide, Y. (1999). Incremental interpretation at verbs: Restricting
the domain of subsequent reference. Cognition, 73(3), 247–264. doi:10.1016/
S0010-0277(99)00059-1
Altmann, G. T. M., & Kamide, Y. (2007). The real-time mediation of visual attention by
language and world knowledge: Linking anticipatory (and other) eye movements to
linguistic processing. Journal of Memory and Language, 57(4), 502–518. doi:10.1016/j.
jml.2006.12.004
References 365
Altmann, G. T. M., & Kamide, Y. (2009). Discourse-mediation of the mapping between

language and the visual world: Eye movements and mental representation. Cognition,
111(1), 55–71. doi:10.1016/j.cognition.2008.12.005
Altmann, G.T. M., & Mirković, J. (2009). Incrementality and prediction in human sentence
processing. Cognitive Science, 33(4), 583–609. doi:10.1111/j.1551-6709.2009.01022.x
Andersson, R., Nyström, M., & Holmqvist, K. (2010). Sampling frequency and eye-tracking
measures: How speed affects durations, latencies, and more. Journal of Eye Movement
Research, 3(3). 1–12. doi:10.16910/jemr.3.3.6
*Andringa, S., & Curcic, M. (2015). How explicit knowledge affects online L2 processing.
Studies in Second Language Acquisition, 37(2), 237–268. doi:10.1017/S0272263115000017
Andringa, S., & Curcic, M. (2016). A validation study: Is visual world eye-tracking suitable
for studying implicit learning? Paper to the Fifth Implicit Learning Seminar, Lancaster,
UK, June 2016.
Anthony, L., & Michel, M. (2016, March). Introducing EyeChat: A data collection tool for
eye-tracking computer mediated communication. Paper presented at the UCREL Research
Seminar, Lancaster University, UK.
Audacity Team. (2012). Audacity [Computer software]. Pittsburg, PA. Retrieved May 15, 2018
from https://audacityteam.org
Baayen, R. H. (2008). Analyzing linguistic data: A practical introduction to statistics. Cambridge,
UK: Cambridge University Press.
Baayen, R. H., Davidson, D. J., & Bates, D. M. (2008). Mixed-effects modeling with crossed
random effects for subjects and items. Journal of Memory and Language, 59(4), 390–412.
doi:10.1016/j.jml.2007.12.005
Baayen, R. H., & Milin, P. (2010). Analyzing reaction times. International Journal of
Psychological Research, 3(2), 12–28.
Baccino, T. (2011). Eye movements and concurrent event-related potentials: Eye fixation-
related potential investigations in reading. In S. P. Liversedge, I. D. Gilchrist, & S.
Everling (Eds.), The Oxford handbook of eye movements (pp. 857–870). New York: Oxford
University Press. doi:10.1093/oxfordhb/9780199539789.013.0047
Baccino, T., & Manunta, Y. (2005). Eye-fixation-related potentials: Insight
into parafoveal processing. Journal of Psychophysiology, 19(3), 204–215.
doi:10.1027/0269-8803.19.3.204
Bachman, L. (2005). Statistical analysis for language assessment. Oxford, UK: Oxford University
Press.
Baddeley, A. D. (1986). Working memory. Oxford, UK: Oxford University Press.
Bahill, A. T., Clark, M. R., & Stark, L. (1975). The main sequence, a tool for
studying human eye movements. Mathematical Biosciences, 24(3–4), 191–204.
doi:10.1016/0025-5564(75)90075-9
Bai, X., Yan, G., Liversedge, S. P., Zang, C., & Rayner, K. (2008). Reading spaced and
unspaced Chinese text: Evidence from eye movements. Journal of Experimental Psychology:
Human Perception and Performance, 34(5), 1277–1287. doi:10.1037/0096-1523.34.5.1277
Ballard, L. (2017). The effects of primacy on rater cognition: An eye-tracking study (Doctoral
dissertation). Retrieved from ProQuest Dissertations and Theses Global. (10274418).
*Balling, L. W. (2013). Reading authentic texts: What counts as cognate? Bilingualism:
Language and Cognition, 16(3), 637–653. doi:10.1017/S1366728911000733.
Balota, D. A., Pollatsek, A., & Rayner, K. (1985). The interaction of contextual constraints
and parafoveal visual information in reading. Cognitive Psychology, 17(3), 364–390.
doi:10.1016/0010-0285(85)90013-1
366 References
Baltova, I. (1994).The impact of video on the comprehension skills of core French students.
Canadian Modern Language Review, 50(3), 507–531. doi:10.3138/cmlr.50.3.507
Bar, M. (2007).The proactive brain: Using analogies and associations to generate predictions.
Trends in Cognitive Sciences, 11(7), 280–289. doi:10.1016/j.tics.2007.05.005
Bar, M. (2009). The proactive brain: Memory for predictions. Philosophical Transactions of the
Royal Society B: Biological Sciences, 364(1521), 1235–1243. doi:10.1098/rstb.2008.0310
Barnes, G. R. (2011). Ocular pursuit movements. In S. P. Liversedge, I. Gilchrist, & S.
Everling (Eds.), The Oxford handbook of eye movements (pp. 115–132). Oxford University
Press. doi:10.1093/oxfordhb/9780199539789.013.0007
Barnett,V., & Lewis, T. (1994). Outliers in statistical data (3rd ed.). New York: Wiley.
Barr, D. J. (2008).Analyzing ‘visual world’ eyetracking data using multilevel logistic regression.
Journal of Memory and Language, 59(4), 457–474. doi:10.1016/j.jml.2007.09.002
Barr, D. J., Gann,T. M., & Pierce, R. S. (2011). Anticipatory baseline effects and information
integration in visual world studies. Acta Psychologica, 137(2), 201–207. doi:10.1016/j.
actpsy.2010.09.011
Barr, D. J., Levy, R., Scheepers, C., & Tily, H. J. (2013). Random effects structure for
confirmatory hypothesis testing: Keep it maximal. Journal of Memory and Language, 68(3),
255–278. doi:10.1016/j.jml.2012.11.001
Bates, D., Kliegl, R., Vasishth, S., & Baayen, H. (2015). Parsimonious mixed models.
Retrieved from http://arxiv.org/abs/1506.04967
Bates, E., D’Amico, S., Jacobsen, T., Székely, A., Andonova, E., Devescovi, A., … Tzeng, O.
(2003). Timed picture naming in seven languages. Psychonomic Bulletin & Review, 10(2),
344–380. doi:10.3758/BF03196494
*Bax, S. (2013).The cognitive processing of candidates during reading tests: Evidence from
eye-tracking. Language Testing, 30(4), 441–465. doi:10.1177/0265532212473244
Bell, B. A., Morgan, G. B., Schoeneberger, J. A., Loudermilk, B. L., Kromrey, J. D., & Ferron,
J. M. (2010). Dancing the sample size limbo with mixed models: How low can you
go? SAS Global Forum Proceedings. Retrieved from http://support.sas.com/resources/
papers/proceedings10/197-2010.pdf
Bertera, J. H., & Rayner, K. (2000). Eye movements and the span of the effective stimulus
in visual search. Perception and Psychophysics, 62(3), 576–585. doi:10.3758/BF03212109
Bialystok, E. (2015). Bilingualism and the development of executive function: The role of
attention. Child Development Perspectives, 9(2), 117–121. doi:10.1111/cdep.12116
Binda, P., Cicchini, G. M., Burr, D. C., & Morrone, M. C. (2009). Spatiotemporal distortions
of visual perception at the time of saccades. Journal of Neuroscience, 29(42), 13147–13157.
doi:10.1523/JNEUROSCI.3723-09.2009
*Bisson, M.,Van Heuven, W. J. B., Conklin, K., & Tunney, R. J. (2014). Processing of native
and foreign language subtitles in films: An eye tracking study. Applied Psycholinguistics,
35(2), 399–418. doi:10.1017/S0142716412000434
Blumenfeld, H. K., & Marian, V. (2007). Constraints on parallel activation in bilingual
spoken language processing: Examining proficiency and lexical status using eye-tracking.
Language and Cognitive Processes, 22(5), 633–660. doi:10.1080/01690960601000746
Blumenfeld,H.K.,& Marian,V.(2011).Bilingualism influences inhibitory control in auditory
comprehension. Cognition, 118(2), 245–257. doi:10.1016/j.cognition.2010.10.012
Blythe, H. I. (2014). Developmental changes in eye movements and visual information
encoding associated with learning to read. Current Directions in Psychological Science,
23(3), 201–207. doi:10.1177/0963721414530145
References 367
Blythe, H. I., & Joseph, H. S. S. L. (2011). Children’s eye movements during reading.
In S. P. Liversedge, I. Gilchrist, & S. Everling (Eds.), The Oxford handbook of
eye movements (pp. 643–662). Oxford University Press. doi:10.1093/oxfor
dhb/9780199539789.013.0036
Blythe, H. I., Liversedge, S. P., Joseph, H. S., White, S. J., & Rayner, K. (2009). Visual
information capture during fixations in reading for children and adults. Vision Research,
49(12), 1583–1591. doi:10.1016/j.visres.2009.03.015
Boers, F., & Lindstromberg, S. (2009). Optimizing a lexical approach to instructed second language
acquisition. London, UK: Palgrave Macmillan. doi:10.1057/9780230245006_1
Boers, F., & Lindstromberg, S. (2012). Experimental and intervention studies on formulaic
sequences in a second language. Annual Review of Applied Linguistics, 32, 83–110.
doi:10.1017/S0267190512000050
Boersma, P., & Weenink, D. (2018). Praat: doing phonetics by computer (Version 6.0) [Computer
software]. Amsterdam, the Netherlands. Retrieved June 1, 2018 from http://www.
praat.org
Bojko, A. (2009). Informative or misleading? Heatmaps deconstructed. In J. A. Jacko
(Ed.), Human-computer interaction (pp. 30–39). Berlin: Springer. doi:10.1007/978-3-
642-02574-7_4
*Bolger, P., & Zapata, G. (2011). Semantic categories and context in L2 vocabulary learning.
Language Learning, 61(2), 614–646. doi:10.1111/j.1467-9922.2010.00624.x
Boston, M. F., Hale, J., Kliegl, R., Patil, U., & Vasishth, S. (2008). Parsing costs as predictors
of reading difficulty: An evaluation using the Potsdam Sentence Corpus. Journal of Eye
Movement Research, 2(1), 1–12. doi:10.16910/jemr.2.1.1
Bowles, M. A. (2010). The think-aloud controversy in second language research. New York:
Routledge.
*Boxell, O., & Felser, C. (2017). Sensitivity to parasitic gaps inside subject islands in native
and non-native sentence processing. Bilingualism: Language and Cognition, 20(3), 494–
511. doi:10.1017/S1366728915000942
Braze, D. (2018). Researcher contributed eye tracking tools. Retrieved from https://
github.com/davebraze/FDBeye/wiki/Researcher-Contributed-Eye-Tracking-Tools
Brône, G., & Oben, B. (Eds.). (2018). Eye-tracking in interaction: Studies on the role of eye gaze
in dialogue (Vol. 10). New York: John Benjamins.
Brysbaert, M., & Stevens, M. (2018). Power analysis and effect size in mixed effects models:
A tutorial. Journal of Cognition, 1(1), 9. doi:10.5334/joc.10
Bultena, S., Dijkstra,T., & van Hell, J. G. (2014). Cognate effects in sentence context depend
on word class, L2 proficiency, and task. The Quarterly Journal of Experimental Psychology,
67(6), 1214–1241. doi:10.1080/17470218.2013.853090
Burnat, K. (2015). Are visual peripheries forever young? Neural Plasticity, 2015, 307929.
doi:10.1155/2015/307929.
Cameron, A. C., & Trivedi, P. K. (1998). Regression analysis of count data. Cambridge, MA:
Cambridge University Press.
Canseco-Gonzalez, E., Brehm, L., Brick, C. A., Brown-Schmidt, S., Fischer, K., &
Wagner, K. (2010). Carpet or Cárcel: The effect of age of acquisition and language
mode on bilingual lexical access. Language and Cognitive Processes, 25, 669–705.
doi:10.1080/01690960903474912
Carrol, G., & Conklin, K. (2015). Eye-tracking multi-word units: Some methodological
questions. Journal of Eye Movement Research, 7(5), 1–11. doi:10.16910/jemr.7.5.5
368 References
*Carrol, G., & Conklin, K. (2017). Cross language lexical priming extends to formulaic
units: Evidence from eye-tracking suggests that this idea “has legs.” Bilingualism: Language
and Cognition, 20(2), 299–317. doi:10.1017/S1366728915000103
*Carrol, G., Conklin, K., & Gyllstad, H. (2016). Found in translation: The influence of
the L1 on the reading of idioms in a L2. Studies in Second Language Acquisition, 38(3),
403–443. doi:10.1017/S0272263115000492
Chambers, C. G., Tanenhaus, M. K., Eberhard, K. M., Filip, H., & Carlson, G. N. (2002).
Circumscribing referential domains during real-time language comprehension. Journal
of Memory and Language, 47(1), 30–49. doi:10.1006/jmla.2001.2832
*Chamorro, G., Sorace, A., & Sturt, P. (2016). What is the source of L1 attrition? The effect
of recent L1 re-exposure on Spanish speakers under L1 attrition. Bilingualism: Language
and Cognition, 19(3), 520–532. doi:10.1017/S1366728915000152
Chen, H. C., & Tang, C. K. (1998). The effective visual field in reading Chinese. In C. K.
Leong & K. Tamaoka (Eds.), Cognitive processing of the Chinese and Japanese languages (pp.
91–100). Dordrecht, the Netherlands: Springer. doi:10.1007/978-94-015-9161-4_5
Chepyshko, R. (2018). Locative verbs in L2 learning: A modular processing perspective (Doctoral
dissertation). Retrieved from ProQuest Dissertations and Theses Global. (10827321).
Choi, J. E. S., Vaswani, P. A., & Shadmehr, R. (2014). Vigor of movements and the cost
of time in decision making. Journal of Neuroscience, 34(4), 1212–1223. doi:10.1523/
JNEUROSCI.2798-13.2014
*Choi, S. (2017). Processing and learning of enhanced English collocations: An eye
movement study. Language Teaching Research, 21(3), 403–426. doi:10.1177/136216881
6653271
Choi, S.Y., & Koh, S. (2009). The perceptual span during reading Korean sentences. Korean
Journal of Cognitive Science, 20(4), 573–601. doi:10.19066/cogsci.2009.20.4.008
Choi, W., Lowder, M. W., Ferreira, F., & Henderson, J. M. (2015). Individual differences
in the perceptual span during reading: Evidence from the moving window technique.
Attention, Perception, & Psychophysics, 77(7), 2463–2475.
Chukharev-Hudilainen, E. Saricaoglu, A., Torrance, M., & Feng, H. (2019). Combined
deployable keystroke logging and eyetracking for investigating L2 writing fluency. Studies
in Second Language Acquisition, 41(3), 583–604. doi:10.1017/S027226311900007X
Cicchini, G. M., Binda, P., Burr, D. C., & Morrone, M. C. (2013). Transient spatiotopic
integration across saccadic eye movements mediates visual stability. Journal of
Neurophysiology, 109(4), 1117–1125. doi:10.1152/jn.00478.2012
*Cintrón-Valentín, M., & Ellis, N. C. (2015). Exploring the interface: Explicit focus-on-
form instruction and learned attentional biases in L2 Latin. Studies in Second Language
Acquisition, 37(2), 197–235. doi:10.1017/S0272263115000029
Cintrón-Valentín, M. C., & Ellis, N. C. (2016). Salience in second language acquisition:
Physical form, learner attention, and instructional focus. Frontiers in Psychology, 7, 1–21.
doi:10.3389/fpsyg.2016.01284
Clahsen, H. (2008). Behavioral methods for investigating morphological and syntactic
processing in children. In I. A. Sekerina, E. M. Fernández, & H. Clahsen (Eds.),
Developmental psycholinguistics: On-line methods in children’s language processing (pp. 1–27).
Amsterdam, the Netherlands/Philadelphia, PA: John Benjamins. Retrieved from http://
www.uni-potsdam.de/fileadmin/projects/prim/papers/methods07.pdf
*Clahsen, H., Balkhair, L., Schutter, J., & Cunnings, I. (2013). The time course of
morphological processing in a second language. Second Language Research, 29(1), 7–31.
doi:10.1177/0267658312464970
Clahsen, H., & Felser, C. (2006a). Continuity and shallow structures in language processing.
Applied Psycholinguistics, 27(1), 107–126. doi:10.1017/S0142716406060206
References 369
Clahsen, H., & Felser, C. (2006b). Grammatical processing in language learners. Applied
Psycholinguistics, 27(1), 3–42. doi:10.1017/S0142716406060024
Clark, A. (2013). Whatever next? Predictive brains, situated agents, and the future
of cognitive science. Behavioral and Brain Sciences, 36(3), 181–204. doi:10.1017/
S0140525X12000477
Clark, H. H. (1973). The language-as-fixed-effect fallacy: A critique of language statistics
in psychological research. Journal of Verbal Learning and Verbal Behavior, 12(4), 335–359.
doi:10.1016/S0022-5371(73)80014-3
Clifton, C. J., & Staub, A. (2011). Syntactic influences on eye movements during
reading. In S. Liversedge, I. Gilchrist, & S. Everling (Eds.), The Oxford handbook of eye
movements (pp. 895–909). Oxford, UK: Oxford University Press. doi:10.1093/oxfor
dhb/9780199539789.013.0049
Clifton, C. J., Staub, A., & Rayner, K. (2007). Eye movements in reading words and
sentences. In R. P. G.Van Gompel, M. H. Fischer, W. S. Murray, & R. L. Hill (Eds.), Eye
movements: A window on mind and brain (pp. 341–372). Oxford, UK: Elsevier.
COGAIN. (n.d.). The COGAIN Association. Retrieved from http://www.cogain.org/
home
Cohen, A. D. (2006). The coming of age of research on test-taking strategies. Language
Assessment Quarterly, 3(4), 307–331. doi:10.1080/15434300701333129
Cohen,A. L. (2013). Software for the automatic correction of recorded eye fixation locations
in reading experiments. Behavior Research Methods, 45(3), 679–683. doi:10.3758/
s13428-012-0280-3
Cohen, J. (1988). Statistical power analysis for the behavioral sciences. Hillsdale, NJ: Erlbaum.
Conklin, K., & Pellicer-Sánchez, A. (2016). Using eye-tracking in applied linguistics
and second language research. Second Language Research, 32(3), 453–467.
doi:10.1177/0267658316637401
Conklin, K., Pellicer-Sánchez, A., & Carrol, G. (2018). Eye-tracking. A guide for applied
linguistics research. Cambridge, MA: Cambridge University Press.
Cooper, R. (1974). The control of eye fixation by the meaning of spoken language: A
new methodology for the real-time investigation of speech perception, memory,
and language processing. Cognitive Psychology, 6(1), 84–107. https://psycnet.apa.org/
doi/10.1016/0010-0285(74)90005-X
*Cop, U., Dirix, N., Van Assche, E., Drieghe, D., & Duyck, W. (2017). Reading a book
in one or two languages? An eye movement study of cognate facilitation in L1
and L2 reading. Bilingualism: Language and Cognition, 20(4), 747–769. doi:10.1017/
S1366728916000213
Cop, U., Drieghe, D., & Duyck, W. (2015). Eye movement patterns in natural reading:
A comparison of monolingual and bilingual reading of a novel. PLOS ONE, 10(8),
e0134008. doi:10.1371/journal.pone.0134008
Cop, U., Keuleers, E., Drieghe, D., & Duyck, W. (2015). Frequency effects in monolingual
and bilingual natural reading. Psychonomic Bulletin & Review, 22(5), 1216–1234.
doi:10.3758/s13423-015-0819-2
Corbetta, M. (1998). Frontoparietal cortical networks for directing attention and the eye to
visual locations: Identical, independent, or overlapping neural systems? Proceedings of the
National Academy of Sciences of the United States of America, 95(3), 831–838. doi:10.1073/
pnas.95.3.831
Corbetta, M., & Shulman, G. L. (1998). Human cortical mechanisms of visual attention
during orienting and search. Philosophical Transactions of the Royal Society B: Biological
Sciences, 353(1373), 1353–1362. doi:10.1098/rstb.1998.0289
370 References
Cubilo, J., & Winke, P. (2013). Redefining the L2 listening construct within an integrated
writing task: Considering the impacts of visual-cue interpretation and note-taking.
Language Assessment Quarterly, 10(4), 371–397. doi:10.1080/15434303.2013.824972
Cuetos, F., & Mitchell, D. C. (1988). Cross-linguistic differences in parsing: Restrictions
on the use of the Late Closure strategy in Spanish. Cognition, 30(1), 73–105.
doi:10.1016/0010-0277(88)90004-2
Cunnings, I. (2012). An overview of mixed-effects statistical models for second language
researchers. Second Language Research, 28(3), 369–382. doi:10.1177/0267658312443651
Cunnings, I., & Finlayson, I. (2015). Mixed effects modeling and longitudinal data analysis.
In L. Plonsky (Ed.), Advancing quantitative methods in second language research (pp. 159–
181). New York: Routledge.
*Cunnings, I., Fotiadou, G., & Tsimpli, I. (2017). Anaphora resolution and reanalysis
during L2 sentence processing. Studies in Second Language Acquisition, 39(4), 621–652.
doi:10.1017/S0272263116000292
Dahan, D., Magnuson, J. S., & Tanenhaus, M. K. (2001).Time course of frequency effects in
spoken-word recognition: Evidence from eye movements. Cognitive Psychology, 42(4),
317–367. doi:10.1006/cogp.2001.0750
Dahan, D., Swingley, D., Tanenhaus, M. K., & Magnuson, J. S. (2000). Linguistic gender and
spoken-word recognition in French. Journal of Memory and Language, 42(4), 465–480.
doi:10.1006/jmla.1999.2688
Dahan, D., & Tanenhaus, M. K. (2005). Looking at the rope when looking for the snake:
Conceptually mediated eye movements during spoken-word recognition. Psychonomic
Bulletin & Review, 12(3), 453–459. doi:10.3758/BF03193787
Dahan, D., Tanenhaus, M. K., & Pier Salverda, A. (2007). The influence of visual processing
on phonetically driven saccades in the “visual world” paradigm. In R. Van Gompel,
M. Fischer, W. Murry, & R. Hill (Eds.), Eye movements: A window on mind and brain (pp.
471–486). Oxford, UK: Elsevier. doi:10.1016/B978-008044980-7/50023-9
Dambacher, M., & Kliegl, R. (2007). Synchronizing timelines: Relations between fixation
durations and N400 amplitudes during sentence reading. Brain Research, 1155(1), 147–
162. doi:10.1016/j.brainres.2007.04.027
De Beugher, S., Brône, G., & Goedemé, T. (2014). Automatic analysis of in-the-wild
mobile eye-tracking experiments using object, face and person detection. International
Conference on Computer Vision Theory and Applications (VISAPP), Lisbon, Portugal,
625–633.
De Bot, K., Paribakht, T. S., & Wesche, M. B. (1997). Toward a lexical processing model for
the study of second language vocabulary acquisition: Evidence from ESL reading. Studies
in Second Language Acquisition, 19(3), 309–329. doi:10.1017/S0272263197003021
Defense Language Institute Foreign Language Center. (n.d.). Languages taught at DLIFLC
and duration of courses. Retrieved from http://www.dliflc.edu/home/about/
languages-at-dliflc/
De León Rodríguez, D., Buetler, K. A., Eggenberger, N., Preisig, B. C., Schumacher, R.,
Laganaro, M., … Müri, R. M. (2016).The modulation of reading strategies by language
opacity in early bilinguals: An eye movement study. Bilingualism: Language and Cognition,
19(3), 567–577. doi:10.1017/S1366728915000310
DeLong, K. A., Urbach, T. P., & Kutas, M. (2005). Probabilistic word pre-activation during
language comprehension inferred from electrical brain activity. Nature Neuroscience, 8(8),
1117–1121. doi:10.1038/nn1504
References 371
Deutsch, A., & Bentin, S. (2001). Syntactic and semantic factors in processing gender
agreement in Hebrew: Evidence from ERPs and eye movements. Journal of Memory and
Language, 45(2), 200–224. doi:10.1006/jmla.2000.2768
Deutsch, A., & Rayner, K. (1999). Initial fixation location effects in reading Hebrew words.
Language and Cognitive Processes, 14(4), 393–421. doi:10.1080/016909699386284
Diefendorf, A. R., & Dodge, R. (1908). An experimental study of the ocular reactions
of the insane from photographic records. Brain, 31(3), 451–489. doi:10.1093/
brain/31.3.451
Dienes, Z. (2014). Using Bayes to get the most out of non-significant results. Frontiers in
Psychology, 5, 1–17. doi:10.3389/fpsyg.2014.00781
*Dijkgraaf, A., Hartsuiker, R. J., & Duyck, W. (2017). Predicting upcoming information
in native-language and non-native-language auditory word recognition. Bilingualism:
Language and Cognition, 20(5), 917–930. doi:10.1017/S1366728916000547
Dimigen, O., Kliegl, R., & Sommer, W. (2012). Trans-saccadic parafoveal preview benefits
in fluent reading: A study with fixation-related brain potentials. NeuroImage, 62(1), 381–
393. doi:10.1016/j.neuroimage.2012.04.006
Dimigen, O., Sommer,W., Hohlfeld, A., Jacobs, A. M., & Kliegl, R. (2011). Coregistration of
eye movements and EEG in natural reading: Analyses and review. Journal of Experimental
Psychology: General, 140(4), 552–572. doi:10.1037/a0023885
Dink, J.W., & Ferguson, B. (2015). eyetrackingR: An R library for eye-tracking data analysis.
Retrieved from www.eyetrackingr.com
Dodge, R. (1903). Five types of eye movement in the horizontal meridian plane of the field
of regard. American Journal of Physiology - Legacy Content, 8(4), 307–329. doi:10.1152/
ajplegacy.1903.8.4.307
Dodge, R. (1904). The participation of the eye movements in the visual perception of
motion. Psychological Review, 11(1), 1–14. doi:10.1037/h0071641
Dodge, R., & Cline,T. S. (1901).The angle velocity of eye movements. Psychological Review,
8, 145–157. doi:10.1037/h0076100
Dolgunsöz, E., & Sarıçoban, A. (2016). CEFR and eye movement characteristics during
EFL reading: The case of intermediate readers. Journal of Language and Linguistic Studies,
12(2), 238–252.
Drasdo, N., & Fowler, C. W. (1974). Non-linear projection of the retinal image in a wide-
angle schematic eye. The British Journal of Ophthalmology, 58(8), 709.
Drieghe, D. (2008). Foveal processing and word skipping during reading. Psychonomic
Bulletin & Review, 15(4), 856–860. doi:10.3758/PBR.15.4.856
Drieghe, D., Rayner, K., & Pollatsek, A. (2008). Mislocated fixations can account for
parafoveal-on-foveal effects in eye movements during reading. The Quarterly Journal of
Experimental Psychology, 61(8), 1239–1249. doi:10.1080/17470210701467953
Duchowski, A. T. (2002). A breadth-first survey of eye-tracking applications. Behavior
Research Methods, Instruments, & Computers, 34(4), 455–470. doi:10.3758/BF03195475
Duchowski, A.T. (2007). Eye tracking methodology:Theory and practice. London, UK: Springer.
Duñabeitia, J. A., Avilés, A., Afonso, O., Scheepers, C., & Carreiras, M. (2009). Qualitative
differences in the representation of abstract versus concrete words: Evidence
from the visual-world paradigm. Cognition, 110(2), 284–292. doi:10.1016/j.
cognition.2008.11.012
Dussias,P.E.(2010).Uses of eye-tracking data in second language sentence processing research.
Annual Review of Applied Linguistics, 30, 149–166. doi:10.1017/S026719051000005X
372 References
*Dussias, P. E., & Sagarra, N. (2007). The effect of exposure on syntactic parsing in Spanish
– English bilinguals. Bilingualism: Language and Cognition, 10(1), 101–116. doi:10.1017/
S1366728906002847
*Dussias, P. E., Valdés Kroff, J. R., Guzzardo Tamargo, R. E., & Gerfen, C. (2013). When
gender and looking go hand in hand. Studies in Second Language Acquisition, 35(2), 353–
387. doi:10.1017/S0272263112000915
d’Ydewalle, G., & De Bruycker, W. (2007). Eye movements of children and
adults while reading television subtitles. European Psychologist, 12(3), 196–205.
doi:10.1027/1016-9040.12.3.196
d’Ydewalle, G., & Gielen, I. (1992). Attention allocation with overlapping sound, image,
and text. In K. Rayner (Ed.), Eye movements and visual cognition (pp. 415–427). New York:
Springer. doi:10.1007/978-1-4612-2852-3_25
d’Ydewalle, G., Praet, C., Verfaillie, K., & Van Rensbergen, J. (1991). Watching subtitled
television. Communication Research, 18(5), 650–666. doi:10.1177/009365091018005005
Eberhard, K. M., Spivey-Knowlton, M. J., Sedivy, J. C., & Tanenhaus, M. K. (1995). Eye
movements as a window into real-time spoken language comprehension in natural
contexts. Journal of Psycholinguistic Research, 24(6), 409–436. doi:10.1007/BF02143160
Educational Testing Service. (2005). TOEFL® iBT writing sample responses. Retrieved
March 30, 2017 from https://www.ets.org/Media/Tests/TOEFL/pdf/ibt_writing_
sample_responses.pdf
Eggert, T. (2007). Eye movement recordings: Methods. Developments in Ophthalmology, 40,
15–34. doi:10.1159/000100347
*Elgort, I., Brysbaert, M., Stevens, M., & Van Assche, E. (2018). Contextual word learning
during reading in a second language: An eye-movement study. Studies in Second Language
Ellis, N. C. (2006). Usage-based and form-focused language acquisition: The associative
learning of constructions, learned attention, and the limited L2 endstate. In P. Robinson
& N. C. Ellis (Eds.), Handbook of cognitive linguistics and second language acquisition (pp.
372–405). New York: Routledge.
*Ellis, N. C., Hafeez, K., Martin, K. I., Chen, L., Boland, J., & Sagarra, N. (2014). An
eye-tracking study of learned attention in second language acquisition. Applied
Ellis, R. (2005). Measuring implicit and explicit knowledge of a second language: A
psychometric study. Studies in Second Language Acquisition, 27(2), 141–172. doi:10.1017/
S0272263105050096
Engbert, R. (2006). Microsaccades: A microcosm for research on oculomotor control,
attention, and visual perception. Progress in Brain Research, 154, 177–192. doi:10.1016/
S0079-6123(06)54009-9
Engbert, R., & Kliegl, R. (2011). Parallel graded attention models of reading.
In S. P. Liversedge, I. Gilchrist, & S. Everling (Eds.), The Oxford handbook of
eye movements (pp. 787–800). Oxford University Press. doi:10.1093/oxfor
dhb/9780199539789.013.0043
Engbert, R., Nuthmann, A., Richter, E. M., & Kliegl, R. (2005). SWIFT: A dynamical
model of saccade generation during reading. Psychological Review, 112(4), 777–813.
doi:10.1037/0033-295X.112.4.777
Ericsson, K. A., & Simon, H. A. (1993). Protocol analysis: Verbal reports as data (Rev. ed.).
Cambridge, MA: The MIT Press.
References 373
Eser, I., Durrie, D. S., Schwendeman, F., & Stahl, J. E. (2008). Association between ocular
dominance and refraction. Journal of Refractive Surgery, 24(7), 685–689.
Eye Movements Researchers’ Association. (2012). Eye data quality. Retrieved from http://
www.eye-movements.org/eye_data_quality
*Felser, C., & Cunnings, I. (2012). Processing reflexives in a second language: The timing
of structural and discourse-level constraints. Applied Psycholinguistics, 33(3), 571–603.
doi:10.1017/S0142716411000488
*Felser, C., Cunnings, I., Batterham, C., & Clahsen, H. (2012). The timing of island effects
in nonnative sentence processing. Studies in Second Language Acquisition, 34(1), 67–98.
doi:10.1017/S0272263111000507
Felser, C., Roberts, L., Marinis, T., & Gross, R. (2003). The processing of ambiguous
sentences by first and second language learners of English. Applied Psycholinguistics,
24(3), 453–489. doi:10.1017/S0142716403000237
*Felser, C., Sato, M., & Bertenshaw, N. (2009).The on-line application of binding Principle
A in English as a second language. Bilingualism: Language and Cognition, 12(4), 485–502.
doi:10.1017/S1366728909990228
Fender, M. (2003). English word recognition and word integration skills of native Arabic-
and Japanese-speaking learners of English as a second language. Applied Psycholinguistics,
24(2), 289–315. doi:10.1017/S014271640300016X
Feng, G. (2006). Eye movements as time-series random variables: A stochastic model of eye
movement control in reading. Cognitive Systems Research, 7(1), 70–95. doi:10.1016/j.
cogsys.2005.07.004
Feng, G., Miller, K., Shu, H., & Zhang, H. (2009). Orthography and the development of
reading processes: An eye-movement study of Chinese and English. Child Development,
80(3), 720–735. doi:10.1111/j.1467-8624.2009.01293.x
Ferreira, F., & Clifton, J. R. (1986). The independence of syntactic processing. Journal of
Memory and Language, 25, 348–368. doi:10.1016/0749-596X(86)90006-9
Ferreira, F., Foucart, A., & Engelhardt, P. E. (2013). Language processing in the visual world:
Effects of preview, visual complexity, and prediction. Journal of Memory and Language,
69(3), 165–182. doi:10.1016/j.jml.2013.06.001
Ferreira, F., & Henderson, J. M. (1990). Use of verb information in syntactic parsing:
Evidence from eye movements and word-by-word self-paced reading. Journal
of Experimental Psychology: Learning, Memory, and Cognition, 16(4), 555–568.
doi:10.1037/0278-7393.16.4.555
Field, A. (2018). Discovering statistics using IBM SPSS statistics (5th ed.). London, UK: Sage.
Findlay, J. M. (2004). Eye scanning and visual search. In J. M. Henderson, & F. Ferreira
(Eds.), The interface of language, vision, and action: Eye movements and the visual world (pp.
135–160). Chicago, IL: Psychology Press.
Findlay, J. M., & Gilchrist, I. D. (2003). Active vision: The psychology of looking and seeing.
Oxford, UK: Oxford University Press.
*Flecken, M. (2011). Event conceptualization by early Dutch–German bilinguals: Insights
from linguistic and eye-tracking data. Bilingualism: Language and Cognition, 14(1), 61–77.
doi:10.1017/S1366728910000027
*Flecken, M., Carroll, M.,Weimar, K., & Von Stutterheim, C. (2015). Driving along the road
or heading for the village? Conceptual differences underlying motion event encoding
in French, German, and French-German L2 users. The Modern Language Journal, 99(S1),
100–122. doi:10.1111/modl.12181
374 References
Fodor, J. A. (1983). The modularity of mind: An essay on faculty psychology. Cambridge, MA:
The MIT Press.
Forster, K. I. (1970). Visual perception of rapidly presented word sequences of varying
complexity. Perception and Psychophysics, 8(4), 215–221. doi:10.3758/BF03210208
Foucart, A., & Frenck-Mestre, C. (2012). Can late L2 learners acquire new grammatical
features? Evidence from ERPs and eye-tracking. Journal of Memory and Language, 66(1),
226–248. doi:10.1016/j.jml.2011.07.007
Fox, M. C., Ericsson, K. A., & Best, R. (2011). Do procedures for verbal reporting of
thinking have to be reactive? A meta-analysis and recommendations for best reporting
methods. Psychological Bulletin, 137(2), 316–344. doi:10.1037/a0021663
Fraser, C.A. (1999). Lexical processing strategy use and vocabulary learning through reading.
Frazier, L. (1987). Sentence processing:A tutorial review. In M. Coltheart (Ed.), Attention and
performance XII.The psychology of reading (Vol. XII, pp. 559–586). Hillsdale, NJ: Laurence
Erlbaum Associates. Retrieved from http://cnbc.cmu.edu/~plaut/IntroPDP/papers/
Frazier87.sentProcRev.pdf
Frenck-Mestre, C. (2005). Eye-movement recording as a tool for studying syntactic
processing in a second language: A review of methodologies and experimental findings.
Second Language Research, 21(2), 175–198. doi:10.1191/0267658305sr257oa
Friston, K. (2010). The free-energy principle: A unified brain theory? Nature Reviews
Neuroscience, 11(2), 127–138. doi:10.1038/nrn2787
Fukkink, R. G. (2005). Deriving word meaning from written context: A process analysis.
Learning and Instruction, 15(1), 23–43. doi:10.1016/j.learninstruc.2004.12.002
Gánem-Gutiérrez, G. A., & Gilmore, A. (2018). Tracking the real-time evolution of a
writing event: Second language writers at different proficiency levels. Language Learning,
68(2), 469–506. doi:10.1111/lang.12280
Gass, S. M. (1997). Input, interaction, and the second language learner. Mahwah, NJ: Lawrence
Erlbaum Associates.
Gass, S. M., & Mackey, A. (2017). Stimulated recall methodology in applied linguistics and L2
research (2nd ed.). New York: Routledge.
Gelman, A., & Hill, J. (2007). Data analysis using regression and hierarchical/multilevel models.
Cambridge, MA: Cambridge University Press.
Gilchrist, I. D. (2011). Saccades. In S. P. Liversedge, I. D. Gilchrist, & S. Everling (Eds.), The
Oxford handbook of eye movements (pp. 85–94). Oxford, UK: Oxford University Press.
doi:10.1093/oxfordhb/9780199539789.013.0005
Godfroid, A. (2010). Cognitive processes in Second Language Acquisition: The role of
noticing, attention and awareness in processing words in written L2 input (Unpublished
doctoral dissertation). University of Brussels, Belgium.
Godfroid, A. (2012). Eye tracking. In P. Robinson (Ed.), Routledge encyclopedia of second
language acquisition (pp. 234–236). New York: Routledge.
Godfroid, A. (2016). The effects of implicit instruction on implicit and explicit knowledge
development. Studies in Second Language Acquisition, 38(2), 177–215. doi:10.1017/
S0272263115000388
Godfroid, A. (2019). Investigating instructed second language acquisition using L2 learners’
eye-tracking data. In R. P. Leow (Ed.), The Routledge handbook of second language research
in classroom learning (pp. 44–57). New York: Routledge.
Godfroid, A. (in press). Implicit and explicit learning and knowledge. In H. Mohebbi & C.
Coombe (Eds.), Research questions in language education and applied linguistics. Springer.
References 375
*Godfroid, A., Ahn, J., Choi, I., Ballard, L., Cui, Y., Johnston, S., … Yoon, H.-J.
(2018). Incidental vocabulary learning in a natural reading context: An eye-
tracking study. Bilingualism: Language and Cognition, 21(3), 563–584. doi:10.1017/
S1366728917000219
Godfroid, A., Ahn, J., Rebuschat, P., & Dienes, Z. (in preparation). Development of explicit
knowledge from artificial language learning: Evidence from eye movements.
*Godfroid, A., Boers, F., & Housen, A. (2013). An eye for words: Gauging the role of
attention in incidental L2 vocabulary acquisition by means of eye-tracking. Studies in
Second Language Acquisition, 35(3), 483–517. doi:10.1017/S0272263113000119
*Godfroid, A., Loewen, S., Jung, S., Park, J.-H., Gass, S., & Ellis, R. (2015). Timed and
untimed grammaticality judgments measure distinct types of knowledge: Evidence
from eye-movement patterns. Studies in Second Language Acquisition, 37(2), 269–297.
doi:10.1017/S0272263114000850
Godfroid, A., & Schmidtke, J. (2013). What do eye movements tell us about awareness ? A
triangulation of eye-movement data, verbal reports, and vocabulary learning scores. In
J. Bergsleithner, S. Frota, & J. K. Yoshioka (Eds.), Noticing and second language acquisition:
Studies in honor of Richard Schmidt (pp. 183–205). Honolulu, HI: University of Hawai‘i,
National Foreign Language Resource Center. Retrieved from http://sls.msu.edu/
files/5213/8229/7769/Godfroid__Schmidtke_2013.pdf
*Godfroid, A., & Spino, L. A. (2015). Reconceptualizing reactivity of think-alouds and
eye tracking: Absence of evidence is not evidence of absence. Language Learning, 65(4),
896–928. doi:10.1111/lang.12136
Godfroid, A., & Uggen, M. S. (2013). Attention to irregular verbs by beginning learners
of German. Studies in Second Language Acquisition, 35(2), 291–322. doi:10.1017/
S0272263112000897
Godfroid, A., & Winke, P. M. (2015). Investigating implicit and explicit processing using
L2 learners’ eye-movement data. In P. Rebuschat (Ed.), Implicit and explicit learning of
languages (pp. 325–348). Amsterdam, the Netherlands: John Benjamins.
Goo, J. (2010). Working memory and reactivity. Language Learning, 60(4), 712–752.
doi:10.1111/j.1467-9922.2010.00573.x
Gough, P. B. (1972). One second of reading. Visible Language, 6(4), 291–320.
Green, A. (1998). Verbal protocol analysis in language testing research: A handbook. New York:
Cambridge University Press.
Green, P., & MacLeod, C. J. (2016). SIMR: An R package for power analysis of generalized
linear mixed models by simulation. Methods in Ecology and Evolution, 7(4), 493–498.
doi:10.1111/2041-210X.12504
Green, P., MacLeod, C. J., & Alday, P. (2016). Package ‘simr’, Available at: https://cran.r-
project.org/web/packages/simr/simr.pdf.
Gries, S. Th. (2013). Statistics for linguistics with R (2nd ed.). Berlin, Germany: Walter de
Gruyter.
Gries, S. Th. (2015). The most under-used statistical method in corpus linguistics:
Multi-level (and mixed-effects) models. Corpora, 10(1), 95–125. doi:10.3366/
cor.2015.0068
Griffin, Z. M., & Bock, K. (2000). What the eyes say about speaking. Psychological Science,
11(4), 274–279. doi:10.1111/1467-9280.00255
Grosbras, M. H., Laird, A. R., & Paus,T. (2005). Cortical regions involved in eye movements,
shifts of attention, and gaze perception. Human Brain Mapping, 25(1), 140–154.
doi:10.1002/hbm.20145
376 References
*Grüter, T., Lew-Williams, C., & Fernald, A. (2012). Grammatical gender in L2: A
production or a real-time processing problem? Second Language Research, 28(2), 191–
215. doi:10.1177/0267658312437990
Grüter,T., Rohde, H., & Schafer,A. J. (2014).The role of discourse-level expectations in non-
native speakers’ referential choices. In W. Orman & M. J.Valleau (Eds.), Proceedings of the
38th annual Boston university conference on language development (pp. 179–191) Somerville,
MA: Cascadilla Press. Retrieved from https://par.nsf.gov/servlets/purl/10028988
Grüter, T., Rohde, H., & Schafer, A. J. (2017). Coreference and discourse coherence in L2:
The roles of grammatical aspect and referential form. Linguistic Approaches to Bilingualism,
7(2), 199–229. doi:10.1075/lab.15011.gru
Guan, C. Q., Liu, Y., Chan, D. H. L., Ye, F., & Perfetti, C. A. (2011). Writing strengthens
orthography and alphabetic-coding strengthens phonology in learning to read Chinese.
Journal of Educational Psychology, 103(3), 509–522. https://psycnet.apa.org/doi/10.1037/
a0023730
Gullberg, M., & Holmqvist, K. (1999). Keeping an eye on gestures: Visual perception
of gestures in face-to-face communication. Pragmatics & Cognition, 7(1), 35–63.
doi:10.1075/pc.7.1.04gul
Gullberg, M., & Holmqvist, K. (2006).What speakers do and what addressees look at: Visual
attention to gestures in human interaction live and on video. Pragmatics & Cognition,
14(1), 53–82. doi:10.1075/pc.14.1.05gul
Hafed, Z. M., & Krauzlis, R. J. (2010). Microsaccadic suppression of visual bursts in the
primate superior colliculus. Journal of Neuroscience, 30(28), 9542–9547. doi:10.1523/
JNEUROSCI.1137-10.2010
Häikiö, T., Bertram, R., Hyönä, J., & Niemi, P. (2009). Development of the letter identity
span in reading: Evidence from the eye movement moving window paradigm. Journal of
Experimental Child Psychology, 102(2), 167–181. doi:10.1016/j.jecp.2008.04.002
Hair, J. F., Black, W. C., Babin, B. J., & Anderson, R. E. (2010). Multivariate data analysis (7th
ed.). Hoboken, NJ: Pearson Education Inc.
Hama, M., & Leow, R. P. (2010). Learning without awareness revisited. Extending Williams
(2005).Studies in Second LanguageAcquisition, 32,465–491.doi:10.1017/S0272263110000045
Hattie, J. (1992). Measuring the effects of schooling. Australian Journal of Education, 36(1),
5–13. doi:10.1177/000494419203600102
Havik, E., Roberts, L., van Hout, R., Schreuder, R., & Haverkort, M. (2009). Processing
subject-object ambiguities in the L2:A self-paced reading study with German L2 learners
of Dutch. Language Learning, 59(1), 73–112. doi:10.1111/j.1467-9922.2009.00501.x
Hayes, T. R., & Henderson, J. M. (2017). Scan patterns during real-world scene viewing
predict individual differences in cognitive capacity. Journal of Vision, 17(5), 23, 1–17.
doi:10.1167/17.5.23
He, X., & Li, W. (2018, March). Working memory, inhibitory control, and learning L2
grammar with input-output activities: Evidence from eye movements. Paper to the
Annual Meeting of the American Association of Applied Linguistics, Chicago, IL.
Henderson, J. M., & Ferreira, F. (1990). Effects of foveal processing difficulty on the
perceptual span in reading: Implications for attention and eye movement control. Journal
of Experimental Psychology: Learning, Memory, and Cognition, 16(3), 417–429. https://
psycnet.apa.org/doi/10.1037/0278-7393.16.3.417
Henderson, J. M., & Ferreira, F. (2004). Scene perception for psycholinguists. In J. M.
Henderson & F. Ferreira (Eds.), The interface of language, vision, and action: Eye movements
and the visual world (pp. 1–58). New York: Psychology Press.
References 377
Henderson, J. M., & Hollingworth, A. (1999). High-level scene perception. Annual Review
of Psychology, 50(1), 243–271. doi.org/10.1146/annurev.psych.50.1.243
Henderson, J. M., & Luke, S. G. (2014). Stable individual differences in saccadic eye
movements during reading, pseudoreading, scene viewing, and scene search. Journal
of Experimental Psychology: Human Perception and Performance, 40(4), 1390–1400.
doi:10.1037/a0036330
Hering, C. (1879). Condensed materia medica (2nd ed.). New York: Boericke & Tafel.
Hilbe, J. M. (2007). Negative binomial regression. Cambridge, MA: Cambridge University Press.
Hintz, F., Meyer, A. S., & Huettig, F. (2017). Predictors of verb-mediated anticipatory eye
movements in the visual world. Journal of Experimental Psychology: Learning, Memory, and
Cognition, 43(9), 1352–1374. doi:10.1037/xlm0000388
Hirotani, M., Frazier, L., & Rayner, K. (2006). Punctuation and intonation effects on clause
and sentence wrap-up: Evidence from eye movements. Journal of Memory and Language,
54(3), 425–443. doi:10.1016/j.jml.2005.12.001
Holmqvist, K., Nyström, M., Andersson, R., Dewhurst, R., Halszka, J., & van de Weijer, J.
(2011). Eye tracking: A comprehensive guide to methods and measures. Oxford, UK: Oxford
University Press.
Holmqvist, K., & Zemblys, R. (2016, June). Common predictors of accuracy, precision and
data loss in eye-trackers. Paper presented at the 7th Scandinavian Workshop on Applied
Eye Tracking, Turku, Finland.
Holmqvist, K., Zemblys, R., Dixon, D. C., Mulvey, F. B., Borah, J., & Pelz, J. B. (2015,
August). The effect of sample selection methods on data quality measures and on
predictors for data quality. Paper presented at the 18th European Conference on Eye
Movements,Vienna, Austria.
Holšánová, J. (2008). Discourse, vision, and cognition. Amsterdam, the Netherlands: Benjamins.
Retrieved from http://ezproxy.lib.utexas.edu/login?url=http://search.ebscohost.com/
login.aspx?direct=true&db=psyh&AN=2008-00802-000&site=ehost-live
Hopp, H. (2009). The syntax–discourse interface in near-native L2 acquisition: Off-
line and on-line performance. Bilingualism: Language and Cognition, 12(4), 463–483.
doi:10.1017/S1366728909990253
*Hopp, H. (2013). Grammatical gender in adult L2 acquisition: Relations between
lexical and syntactic variability. Second Language Research, 29(1), 33–56.
doi:10.1177/0267658312461803
Hopp, H. (2014). Working memory effects in the L2 processing of ambiguous relative
clauses. Language Acquisition, 21(3), 250–278. doi:10.1080/10489223.2014.892943
Hopp, H. (2015). Semantics and morphosyntax in predictive L2 sentence processing.
International Review of Applied Linguistics in Language Teaching, 53(3), 277–306.
doi:10.1515/iral-2015-0014
*Hopp, H. (2016). Learning (not) to predict: Grammatical gender processing
in second language acquisition. Second Language Research, 32(2), 277–307.
doi:10.1177/0267658315624960
*Hopp, H., & Lemmerth, N. (2018). Lexical and syntactic congruency in L2 predictive
gender processing. Studies in Second Language Acquisition, 40(1), 171–199. doi:10.1017/
S0272263116000437000437
*Hopp, H., & León Arriaga, M. E. (2016). Structural and inherent case in the non-native
processing of Spanish: Constraints on inflectional variability. Second Language Research,
32(1), 75–108. doi:10.1177/0267658315605872
Hosseini, K. (2007). A thousand splendid suns. New York: Riverhead Books.
378 References
*Hoversten, L. J., & Traxler, M. J. (2016). A time course analysis of interlingual homograph
processing: Evidence from eye movements. Bilingualism: Language and Cognition, 19(2),
347–360. doi:10.1017/S1366728915000115
Huettig, F. (2015). Four central questions about prediction in language processing. Brain
Research, 1626, 118–135. doi:10.1016/j.brainres.2015.02.014
Huettig, F., & Altmann, G. T. M. (2004). The online processing of ambiguous and
unambiguous words in context: Evidence from head-mounted eye-tracking. In M.
Carreiras & C. Clifton Jr. (Eds.), The on-line study of sentence comprehension: Eyetracking,
ERP and beyond (pp. 187–208). New York: Psychology Press.
Huettig, F., & Altmann, G. T. M. (2005). Word meaning and the control of eye fixation:
semantic competitor effects and the visual world paradigm. Cognition, 96(1), B23–B32.
doi:10.1016/j.cognition.2004.10.003
Huettig, F., & Altmann, G. T. M. (2011). Looking at anything that is green when hearing
“frog”: How object surface colour and stored object colour knowledge influence
language-mediated overt attention. Quarterly Journal of Experimental Psychology, 64(1),
122–145. doi:10.1080/17470218.2010.481474
Huettig, F., & McQueen, J. M. (2007). The tug of war between phonological, semantic and
shape information in language-mediated visual search. Journal of Memory and Language,
57(4), 460–482. doi:10.1016/j.jml.2007.02.001
Huettig, F., Rommers, J., & Meyers, A. S. (2011). Using the visual word paradigm to study
language processing: A review and critical evaluation. Acta Psychologica, 137, 151–171.
doi:10.1016/j.actpsy.2010.11.003
Huey, E. B. (1908). The psychology and pedagogy of reading. New York: Macmillan.
Hutzler, F., Braun, M., Võ, M. L.-H., Engl, V., Hofmann, M., Dambacher, M., … Jacobs,
A. M. (2007). Welcome to the real world: Validating fixation-related brain potentials
for ecologically valid settings. Brain Research, 1172, 124–129. doi:10.1016/j.
brainres.2007.07.025
Hyönä, J., Lorch, R. F., & Kaakinen, J. K. (2002). Individual differences in reading to
summarize expository text: Evidence from eye fixation patterns. Journal of Educational
Psychology, 94(1), 44–55. https://psycnet.apa.org/doi/10.1037/0022-0663.94.1.44
Hyönä, J., & Nurminen, A. M. (2006). Do adult readers know how they read? Evidence
from eye movement patterns and verbal reports. British Journal of Psychology, 97(1), 31–
50. doi:10.1348/000712605X53678
Ikeda, M., & Saida, S. (1978). Span of recognition in reading. Vision Research, 18(1), 83–88.
doi:10.1016/0042-6989(78)90080-9
*Indrarathne, B., & Kormos, J. (2017). Attentional processing of input in explicit and
implicit conditions: An eye-tracking study. Studies in Second Language Acquisition, 39(3),
401–430. doi:10.1017/S027226311600019X
*Indrarathne, B., & Kormos, J. (2018). The role of working memory in processing L2
input: Insights from eye-tracking. Bilingualism: Language and Cognition, 21(2), 355–374.
doi:10.1017/S1366728917000098
Inhoff, A.W., & Radach, R. (1998). Definition and computation of oculomotor measures in
the study of cognitive processes. In G. Underwood (Ed.), Eye guidance in reading and scene
perception (pp. 29–53). New York: Elsevier. doi:10.1016/B978-008043361-5/50003-1
Irwin, D. E. (1998). Lexical processing during saccadic eye movements. Cognitive Psychology,
36(1), 1–27. doi:10.1006/cogp.1998.0682
Issa, B., & Morgan-Short, K. (2019). Effects of external and internal attentional manipulations
on second language grammar development: An eye-tracking study. Studies in Second
Language Acquisition, 41(2), 389–417. doi:10.1017/S027226311800013X
References 379
Issa, B., Morgan-Short, K., Villegas, B., & Raney, G. (2015). An eye-tracking study on the
role of attention and its relationship with motivation. EuroSLA Yearbook, 15, 114–142.
doi:10.1075/eurosla.15.05iss
*Ito, A., Corley, M., & Pickering, M. J. (2018). A cognitive load delays predictive eye
movements similarly during L1 and L2 comprehension. Bilingualism: Language and
Cognition, 21(2), 251–264. doi:10.1017/S1366728917000050
Ito, K., & Speer, S. R. (2008). Anticipatory effects of intonation: Eye movements during
instructed visual search. Journal of Memory and Language, 58(2), 541–573. doi:10.1016/j.
jml.2007.06.013
Izumi, S., Bigelow, M., Fujiwara, M., & Fearnow, S. (1999). Testing the output hypothesis:
Effects of output on noticing and second language acquisition. Studies in Second Language
Izumi, S., & Bigelow, M. (2000). Does output promote noticing and second language
acquisition? TESOL Quarterly, 34(2), 239–278. doi:10.2307/3587952
Jackson, C. N., & Bobb, S. C. (2009). The processing and comprehension of wh-questions
among second language speakers of German. Applied Psycholinguistics, 30(4), 603–636.
doi:10.1017/S014271640999004X
Jacobs, A. M. (2000). Five questions about cognitive models and some answers from three
models of reading. In A. Kennedy, R. Radach, D. Heller, & J. Pynte (Eds.), Reading as a
perceptual process (pp. 721–732). Oxford, UK: Elsevier.
Jacobson, E. (1930). Electrical measurements of neuromuscular states during mental
activities: 1. Imagination of movement involving skeletal muscle. American Journal of
Physiology, 95, 567–608. doi:10.1152/ajplegacy.1930.91.2.567
Jaeger, T. F. (2008). Categorical data analysis: Away from ANOVAs (transformation or
not) and towards logit mixed models. Journal of Memory and Language, 59(4), 434–446.
doi:10.1016/j.jml.2007.11.007
Jegerski, J. (2014). Self-paced reading. In J. Jegerski & B.VanPatten (Eds.), Research methods
in second language psycholinguistics (pp. 20–49). London, UK: Taylor & Francis.
Jeon, E. H. (2015). Multiple regression. In L. Plonsky (Ed.), Advancing quantitative methods in
second language research (pp. 131–158). New York: Routledge.
Jiang, N. (2012). Conducting reaction time research in second language research. London, UK:
Routledge. doi:10.4324/9780203146255
John Hopkins Medicine. (2014). Fast eye movements: A possible indicator of more impulsive
decision-making. Retrieved from https://www.hopkinsmedicine.org/news/media/
releases/fast_eye_movements_a_possible_indicator_of_more_impulsive_decision_making
Jordan, T. R., Almabruk, A. A. A., Gadalla, E. A., McGowan,V. A., White, S. J., Abedipour, L.,
& Paterson, K. B. (2014). Reading direction and the central perceptual span: Evidence
from Arabic and English. Psychonomic Bulletin & Review, 21(2), 505–511. doi:10.3758/
s13423-013-0510-4
Joseph, H. S. S. L., Wonnacott, E., Forbes, P., & Nation, K. (2014). Becoming a written
word: Eye movements reveal order of acquisition effects following incidental exposure
to new words during silent reading. Cognition, 133(1), 238–248. doi:10.1016/j.
Ju, M., & Luce, P. A. (2004). Falling on sensitive ears. Constraints on bilingual lexical
activation. Psychological Science, 15(5), 314–318. doi:10.1111/j.0956-7976.2004.00675.x
Ju, M., & Luce, P. A. (2006). Representational specificity of within-category phonetic
variation in the long-term mental lexicon. Journal of Experimental Psychology: Human
Perception and Performance, 32(1), 120–138. doi:10.1037/0096-1523.32.1.120
Juffs, A., & Rodríguez, G. A. (2015). Second language sentence processing. New York: Routledge.
380 References
Juhasz, B. J. (2008). The processing of compound words in English: Effects of word length
on eye movements during reading. Language and Cognitive Processes, 23(7–8), 1057–
1088. doi:10.1080/01690960802144434
Juhasz, B. J., & Rayner, K. (2003). Investigating the effects of a set of intercorrelated variables
on eye fixation durations in reading. Journal of Experimental Psychology: Learning, Memory,
and Cognition, 29(6), 1312–1318. doi:10.1037/0278-7393.29.6.1312
Juhasz, B. J., & Rayner, K. (2006). The role of age of acquisition and word frequency
in reading: Evidence from eye fixation durations. Visual Cognition, 13(7–8), 846–863.
doi:10.1080/13506280544000075
Just, M. A., & Carpenter, P. A. (1980). A theory of reading: From eye fixations to
comprehension. Psychological Review, 87(4), 329–354. https://psycnet.apa.org/
doi/10.1037/0033-295X.87.4.329
Just, M. A., Carpenter, P. A., & Woolley, J. D. (1982). Paradigms and processes in reading
comprehension. Journal of Experimental Psychology: General, 111(2), 228–238.
doi:10.1037/0096-3445.111.2.228
Kaakinen, J. K., & Hyönä, J. (2005). Perspective effects on expository text comprehension:
Evidence from think-aloud protocols, eyetracking, and recall. Discourse Processes, 40(3),
239–257. doi:10.1207/s15326950dp4003_4
Kaan, E. (2014). Predictive sentence processing in L2 and L1. Linguistic Approaches to
Bilingualism, 4(2), 257–282. doi:10.1075/lab.4.2.05kaa
Kamide,Y., Altmann, G. T. M., & Haywood, S. L. (2003). The time-course of prediction in
incremental sentence processing: Evidence from anticipatory eye movements. Journal of
Memory and Language, 49(1), 133–156. doi:10.1016/S0749-596X(03)00023-8
*Kaushanskaya, M., & Marian,V. (2007). Bilingual language processing and interference in
bilinguals: Evidence from eye tracking and picture naming. Language Learning, 57(1),
119–163. doi:10.1111/j.1467-9922.2007.00401.x
*Keating, G. D. (2009). Sensitivity to violations of gender agreement in native and
nonnative Spanish: An eye-movement investigation. Language Learning, 59(3), 503–535.
doi:10.1111/j.1467-9922.2009.00516.x
Keating, G. D. (2014). Eye-tracking with text. In J. Jegerski & B. VanPatten (Eds.),
Research methods in second language psycholinguistics (pp. 69–92). London, UK: Taylor
& Francis.
Keating, G. D., & Jegerski, J. (2015). Experimental designs in sentence processing. Studies in
Kennedy, A., & Pynte, J. (2005). Parafoveal-on-foveal effects in normal reading. Vision
Research, 45(2), 153–168. doi:10.1016/j.visres.2004.07.037
Kennedy, A., Pynte, J., & Ducrot, S. (2002). Parafoveal-on-foveal interactions in word
recognition. The Quarterly Journal of Experimental Psychology, 55(4), 1307–1337.
doi:10.1080/02724980244000071
Kerlinger, F. N., & Lee, H. B. (2000). Foundations of behavioral research. Orlando, FL: Harcourt
College Publishers.
Khalifa, H., & Weir, C. J. (2009). Examining reading: Research and practice in assessing second
language reading. Cambridge, MA: Cambridge University Press.
*Kim, E., Montrul, S., & Yoon, J. (2015). The on-line processing of binding principles in
second language acquisition: Evidence from eye tracking. Applied Psycholinguistics, 36(6),
1317–1374. doi:10.1017/S0142716414000307
Kliegl, R., Dambacher, M., Dimigen, O., & Sommer,W. (2014). Oculomotor control, brain
potentials, and timelines of word recognition during natural reading. In M. Horsley, M.
References 381
Eliot, B. A. Knight, & R. Reilly (Eds.), Current trends in eye tracking research (pp. 141–155).
Springer. doi:10.1007/978-3-319-02868-2_10
Kliegl, R., Grabner, E., Rolfs, M., & Engbert, R. (2004). Length, frequency, and predictability
effects of words on eye movements in reading. European Journal of Cognitive Psychology,
16(1–2), 262–284. doi:10.1080/09541440340000213
Kliegl, R., Nuthmann, A., & Engbert, R. (2006). Tracking the mind during reading: The
influence of past, present, and future words on fixation durations. Journal of Experimental
Psychology: General, 135(1), 12–35. doi:10.1037/0096-3445.135.1.12
*Kohlstedt,T., & Mani, N. (2018).The influence of increasing discourse context on L1 and
L2 spoken language processing. Bilingualism: Language and Cognition, 21(1), 121–136.
doi:10.1017/S1366728916001139
Kohsom, C., & Gobet, F. (1997). Adding spaces to Thai and English: Effects on reading. In
L. R. Gleitman & A. K. Joshi (Eds.), Proceedings of the twenty-second annual conference of
the Cognitive Science Society (pp. 388–393). Mahwah, NJ: Lawrence Erlbaum Associates.
Retrieved from https://mindmodeling.org/cogscihistorical/cogsci_22.pdf
Krauzlis, R. J. (2013). Eye movements. In L. R. Squire, D. Berg, F. E. Bloom, S. du Lac, A.
Ghosh, & N. C. Spitzer (Eds.) Fundamental Neuroscience (4th ed., pp. 697–714).Waltham,
MA: Elsevier.
Kretzschmar, F., Bornkessel-Schlesewsky, I., & Schlesewsky, M. (2009). Parafoveal versus
foveal N400s dissociate spreading activation from contextual fit. NeuroReport, 20(18),
1613–1618. doi:10.1097/WNR.0b013e328332c4f4
Kreysa, H., & Pickering, M. J. (2011). Eye movements in dialogue. In S. P. Liversedge, I.
D. Gilchrist, & S. Everling (Eds.), The Oxford handbook of eye movements (pp. 943–959).
Kroll, J. F., & Bialystok, E. (2013). Understanding the consequences of bilingualism for
language processing and cognition. Journal of Cognitive Psychology, 25(5), 497–514. doi:1
0.1080/20445911.2013.799170
Kroll, J. F., Dussias, P. E., Bice, K., & Perrotti, L. (2015). Bilingualism, mind, and brain. Annual
Review of Linguistics, 1(1), 377–394. doi:10.1146/annurev-linguist-030514-124937
Kroll, J. F., & Ma, F. (2017).The bilingual lexicon. In E. M. Fernández & H. S. Cairns (Eds.),
The handbook of psycholinguistics (pp. 294–319). Hoboken, NJ: Wiley.
Kühberger, A., Fritz, A., & Scherndl, T. (2014). Publication bias in psychology: A diagnosis
based on the correlation between effect size and sample size. PLoS ONE, 9(9), e105825.
doi:10.1371/journal.pone.0105825
Kuperberg, G. R., & Jaeger, T. F. (2016). What do we mean by prediction in language
comprehension? Language, Cognition and Neuroscience, 31(1), 32–59. doi:10.1080/2327
3798.2015.1102299
Kuperman,V., & Van Dyke, J. A. (2011). Effects of individual differences in verbal skills on
eye-movement patterns during sentence reading. Journal of Memory and Language, 65(1),
42–73. doi:10.1016/j.jml.2011.03.002
Kurtzman, H. S., Crawford, L. F., & Nychis-Florence, C. (1991). Locating Wh-traces. In
R. C. Berwick, S. P. Abney, & C. Tenny (Eds.), Principle-based parsing (pp. 347–382).
Dordrecht, the Netherlands: Springer. doi:10.1007/978-94-011-3474-3_13
Kutner, M. H., Nachtsheim, C. J., Neter, J., & Li, W. (2005). Applied linear statistical models
(5th ed.). Irwin, NY: McGraw-Hill.
Lachaud, C. M., & Renaud, O. (2011). A tutorial for analyzing human reaction times:
How to filter data, manage missing values, and choose a statistical model. Applied
382 References
Lagrou, E., Hartsuiker, R. J., & Duyck, W. (2013). The influence of sentence context
and accented speech on lexical access in second-language auditory word
recognition. Bilingualism: Language and Cognition, 16(3), 508–517. doi:10.1017/
S1366728912000508
Lamare, M. (1892). Des mouvements des yeux dans la lecture. Bulletins et Mémoires de la
Société Française d’Ophthalmologie, 10, 354–364.
Larsen-Freeman, D., & Long, M. H. (1991). An introduction to second language acquisition
research. New York: Routledge.
Larson-Hall, J. (2016). A guide to doing statistics in second language research using SPSS and R.
Larson-Hall, J. (2017). Moving beyond the bar plot and the line graph to create informative
and attractive graphics. The Modern Language Journal, 101(1), 244–270. doi:10.1111/
modl.12386
Lau, E., & Grüter,T. (2015). Real-time processing of classifier information by L2 speakers of
Chinese. In E. Grillo & K. Jepson (Eds.), Proceedings of the 39th Annual Boston University
Conference on language development (pp. 311–323). Somerville, MA: Cascadilla Press.
Lee, C. H., & Kalyuga, S. (2011). Effectiveness of different pinyin presentation formats
in learning Chinese characters: A cognitive load perspective. Language Learning, 61(4),
1099–1118. doi:10.1111/j.1467-9922.2011.00666.x
*Lee, S., & Winke, P. (2018). Young learners’ response processes when taking
computerized tasks for speaking assessment. Language Testing, 35(2), 239–269.
doi:10.1177/0265532217704009
Leeser, M. J., Brandl, A., & Weissglass, C. (2011). Task effects in second language sentence
processing research. In P. Trofimovich & K. McDonough (Eds.), Applying priming
methods to L2 learning, teaching, and research: Insights from psycholinguistics (pp. 179–198).
Amsterdam, the Netherlands: John Benjamins.
Legge, G. E., & Bigelow, C. A. (2011). Does print size matter for reading? A review
of findings from vision science and typography. Journal of Vision, 11(5), 8, 1–22.
doi:10.1167/11.5.8
Legge, G. E., Klitz,T. S., & Tjan, B. S. (1997). Mr. Chips: An ideal-observer model of reading.
Psychological Review, 104(3), 524–553. doi:10.1037/0033-295X.104.3.524
Leow, R. P. (1997). Attention, awareness, and foreign language behavior. Language Learning,
47(3), 467–505. doi:10.1111/0023-8333.00017
Leow, R. P. (1998). Toward operationalizing the process of attention in SLA: Evidence
for Tomlin and Villa’s (1994) fine grained analysis of attention. Applied Psycholinguistics,
19(1), 133–159. doi:10.1017/S0142716400010626
Leow, R. P. (2000). A study of the role of awareness in foreign language behavior. Studies in
Leow, R. P. (2015). Explicit learning in the L2 classroom: A student-centered approach. New York:
Routledge.
Leow, R. P., Grey, S., Marijuan, S., & Moorman, C. (2014). Concurrent data elicitation
procedures, processes, and the early stages of L2 learning: A critical overview. Second
Language Research, 30(2), 111–127. doi:10.1177/0267658313511979
Leow, R. P., Hsieh, H.-C., & Moreno, N. (2008). Attention to form and meaning revisited.
Language Learning, 58(3), 665–695. doi:10.1111/j.1467-9922.2008.00453.x
Leow, R. P., & Morgan-Short, K. (2004). To think aloud or not to think aloud: The issue
of reactivity in SLA research methodology. Studies in Second Language Acquisition, 26(1),
35–57. doi:10.1017/S0272263104261022
References 383
Lettvin, J. Y., Maturana, H. R., MsCulloch, W. S., & Pitts, W. H. (1968). What the frog’s
eye tells the frog’s brain. In W. C. Corning & M. Balaban (Eds.), The mind: Biological
approaches to its functions (pp. 233–258). London, UK: John Wiley and Sons Inc.
Leung, C. Y., Sugiura, M., Abe, D., & Yoshikawa, L. (2014). The perceptual span in second
language reading: An eye-tracking study using a gaze-contingent moving window
paradigm. Open Journal of Modern Linguistics, 4(5), 585–594. doi:10.4236/ojml.2014.45051
Leung, J. H. C., & Williams, J. N. (2011). The implicit learning of mappings between forms
and contextually derived meanings. Studies in Second Language Acquisition, 33(1), 33–55.
doi:10.1017/S0272263110000525
Levenshtein, V. I. (1966). Binary codes capable of correcting deletions, insertions and
reversals. Soviet Physics Doklady, 10(8), 707–710.
Lew-Williams, C., & Fernald, A. (2007).Young children learning Spanish make rapid use of
grammatical gender in spoken word recognition. Psychological Science, 18(3), 193–198.
doi:10.1111/j.1467-9280.2007.01871.x
Lew-Williams, C., & Fernald, A. (2010). Real-time processing of gender-marked articles by
native and non-native Spanish speakers. Journal of Memory and Language, 63(4), 447–464.
doi:10.1016/j.jml.2010.07.003
Li, X., Liu, P., & Rayner, K. (2011). Eye movement guidance in Chinese reading: Is there
a preferred viewing location? Vision Research, 51(10), 1146–1156. doi:10.1016/j.
visres.2011.03.004
Liberman, A. M. (2005). How much more likely? The implications of odds ratios
for probabilities. American Journal of Evaluation, 26(2), 253–266. doi:10.1177/
1098214005275825
Lim, H., & Godfroid, A. (2015). Automatization in second language sentence processing:
A partial, conceptual replication of Hulstijn,Van Gelderen, and Schoonen’s 2009 study.
Lim, J. H., & Christianson, K. (2013). Second language sentence processing in reading for
comprehension and translation. Bilingualism: Language and Cognition, 16(3), 518–537.
doi:10.1017/S1366728912000351
*Lim, J. H., & Christianson, K. (2015). Second language sensitivity to agreement errors:
Evidence from eye movements during comprehension and translation. Applied
Linck, J. A., & Cunnings, I. (2015). The utility and application of mixed-effects models in
second language research. Language Learning, 65(S1), 185–207. doi:10.1111/lang.12117
Lipsey, M. W., & Wilson, D. B. (1993). The efficacy of psychological, educational, and
behavioral treatment: Confirmation from meta-analysis. American Psychologist, 48(12),
1181–1209. doi:10.1037/0003-066X.48.12.1181
Liu,Y., Wang, M., & Perfetti, C. A. (2007). Threshold-style processing of Chinese characters
for adult second-language learners. Memory and Cognition, 35(3), 471–480. doi:10.3758/
BF03193287
Liversedge, S. P., & Findlay, J. M. (2000). Saccadic eye movements and cognition. Trends in
Cognitive Sciences, 4(1), 6–14. doi:10.1016/S1364-6613(99)01418-7
Liversedge, S. P., Gilchrist, I., & Everling, S. (2011). The Oxford handbook of eye movements.
Loewen, S. (2015). Introduction to instructed second language acquisition. New York: Routledge.
Lotto, L., Dell’Acqua, R., & Job, R. (2001). Le figure PD/DPSS. Misure di accordo sul
nome, tipicità, familiarità, età di acquisizione e tempi di denominazione per 266 figure.
Giornale Italiano Di Psicologia, 28(1), 193–210. doi:10.1421/337
384 References
Lowell, R., & Morris, R. K. (2014). Word length effects on novel words: Evidence from
eye movements. Attention, Perception, & Psychophysics, 76(1), 179–189. doi:10.3758/
s13414-013-0556-4
Luck, S. J. (2014). An introduction to the event-related potential technique (2nd ed.). Cambridge,
MA: The MIT Press.
Luke, S. G., Henderson, J. M., & Ferreira, F. (2015). Children’s eye-movements during
reading reflect the quality of lexical representations: An individual differences approach.
Journal of Experimental Psychology: Learning, Memory, and Cognition, 41(6), 1675–1683.
https://psycnet.apa.org/doi/10.1037/xlm0000133
Lupyan, G. (2016).The centrality of language in human cognition. Language Learning, 66(3),
516–553. doi:10.1111/lang.12155
MacDonald, M. C., Pearlmutter, N. J., & Seidenberg, M. S. (1994). The lexical
nature of syntactic ambiguity resolution. Psychological Review, 101(4), 676–703.
doi:10.1037/0033-295X.101.4.676
Mackey, A., & Gass, S. M. (2016). Second language research: Methodology and research (2nd ed.).
Mahwah, NJ: Laurence Erlbaum Associates.
MacLeod, C. M. (1991). Half a century of research on the Stroop effect: An integrative
review. Psychological Bulletin, 109(2), 163–203. doi:10.1037/0033-2909.109.2.163
*Marian, V., & Spivey, M. (2003a). Bilingual and monolingual processing of competing
lexical items. Applied Psycholinguistics, 24(2), 173–193. doi:10.1017/S0142716403000092
*Marian,V., & Spivey, M. (2003b). Competing activation in bilingual language processing:
Within- and between-language competition. Bilingualism: Language and Cognition, 6(2),
97–115. doi:10.1017/S1366728903001068
Marinis, T. (2010). Using on-line processing methods in language acquisition research. In
E. Blom & S. Unsworth (Eds.), Experimental methods in language acquisition research (pp.
139–162). New York: John Benjamins.
Marinis, T., Roberts, L., Felser, C., & Clahsen, H. (2005). Gaps in second language
sentence processing. Studies in Second Language Acquisition, 27(1), 53–78. doi:10.1017/
S0272263105050035
Marslen-Wilson, W., & Tyler, L. K. (1987). Against modularity. In J. L. Garfield (Ed.),
Modularity in knowledge representation and natural language understanding (pp. 37–62).
Cambridge, MA: The MIT Press.
Martinez-Conde, S., & Macknik, S. L. (2007). Science in culture: Mind tricks. Nature,
448(7152), 414–414. doi:10.1038/448414a
Martinez-Conde, S., & Macknik, S. L. (2011). Microsaccades. In S. P. Liversedge, I. Gilchrist,
& S. Everling (Eds.), The Oxford handbook of eye movements (pp. 95–114). Oxford, UK:
Oxford University Press. doi:10.1093/oxfordhb/9780199539789.013.0006
Matin, E. (1974). Saccadic suppression: A review and an analysis. Psychological Bulletin,
81(12), 899–917. doi:10.1037/h0037368
Matin, E., Shao, K. C., & Boff, K. R. (1993). Saccadic overhead: Information-processing time
with and without saccades. Perception and Psychophysics, 53(4), 372–380. doi:10.3758/
BF03206780
Matuschek, H., Kliegl, R., Vasishth, S., Baayen, H., & Bates, D. (2017). Balancing Type I
error and power in linear mixed models. Journal of Memory and Language, 94, 305–315.
doi:10.1016/j.jml.2017.01.001
McClelland, J. L., & Elman, J. L. (1986).The TRACE model of speech perception. Cognitive
Psychology, 18(1), 1–86. doi:10.1016/0010-0285(86)90015-0
McClelland, J. L., & O’Regan, J. K. (1981). Expectations increase the benefit
derived from parafoveal visual information in reading words aloud. Journal
References 385
of Experimental Psychology: Human Perception and Performance, 7(3), 634–644.

doi:10.1037/0096-1523.7.3.634
McConkie, G.W., & Rayner, K. (1975).The span of the effective stimulus during a fixation
in reading. Perception and Psychophysics, 17(6), 578–586. doi:10.3758/BF03203972
McConkie, G. W., & Rayner, K. (1976a). Asymmetry of the perceptual span in reading.
Bulletin of the Psychonomic Society, 8, 365–368. doi:10.3758/BF03335168
McConkie, G. W., & Rayner, K. (1976b). Identifying the span of the effective stimulus
of reading: Literature review and theories of reading. In H. Singer & R. B. Ruddell
(Eds.), Theoretical models and processes in reading (pp. 137–162). Newark, DE: International
Reading Association.
*McCray, G., & Brunfaut, T. (2018). Investigating the construct measured by banked
gap-fill items: Evidence from eye-tracking. Language Testing, 35(1), 51–73.
doi:10.1177/0265532216677105
McDonald, S. A. (2006). Parafoveal preview benefit in reading is only obtained from the
saccade goal. Vision Research, 46(26), 4416–4424. doi:10.1016/j.visres.2006.08.027
McDonald, S. A., Carpenter, R. H. S., & Shillcock, R. C. (2005). An anatomically
constrained, stochastic model of eye movement control in reading. Psychological Review,
112(4), 814–840. doi:10.1037/0033-295X.112.4.814
*McDonough, K., Crowther, D., Kielstra, P., & Trofimovich, P. (2015). Exploring the
potential relationship between eye gaze and English L2 speakers’ responses to recasts.
Second Language Research, 31(4), 563–575. doi:10.1177%2F0267658315589656
*McDonough, K., Trofimovich, P., Dao, P., & Dion, A. (2017). Eye gaze and production
accuracy predict English L2 speakers’ morphosyntactic learning. Studies in Second
Language Acquisition, 39(4), 851–868. doi:10.1017/S0272263116000395
McIntyre, N. A., & Foulsham, T. (2018). Scanpath analysis of expertise and culture in
teacher gaze in real-world classrooms. Instructional Science, 46(3), 435–455. doi:10.1007/
s11251-017-9445-x
*Mercier, J., Pivneva, I., & Titone, D. (2014). Individual differences in inhibitory control
relate to bilingual spoken word processing. Bilingualism: Language and Cognition, 17(1),
89–117. doi:10.1017/S1366728913000084
*Mercier, J., Pivneva, I., & Titone, D. (2016).The role of prior language context on bilingual
spoken word processing: Evidence from the visual world task. Bilingualism: Language and
Cognition, 19(2), 376–399. doi:10.1017/S1366728914000340
Meseguer, E., Carreiras, M., & Clifton, C. (2002). Overt reanalysis strategies and eye
movements during the reading of mild garden path sentences. Memory and Cognition,
30(4), 551–561. doi:10.3758/BF03194956
Metzner, P., von der Malsburg, T., Vasishth, S., & Rösler, F. (2017). The importance of
reading naturally: Evidence from combined recordings of eye movements and electric
brain potentials. Cognitive Science, 41, 1232–1263.
Meyer, C. H., Lasker, A. G., & Robinson, D. A. (1985). The upper limit of human smooth
pursuit velocity. Vision Research, 25(4), 561–563. doi:10.1016/0042-6989(85)90160-9
Meyer, A. S., Roelofs, A., & Levelt, W. J. (2003). Word length effects in object naming:
The role of a response criterion. Journal of Memory and Language, 48(1), 131–147.
doi:10.1016/S0749-596X(02)00509-0
Meyer, A. S., Sleiderink, A. M., & Levelt, W. J. M. (1998).Viewing and naming objects: Eye
movements during noun phrase production. Cognition, 66(2), B25–B33. doi:10.1016/
S0010-0277(98)00009-2
Meyers, I. L. (1929). Electronystagmography: A graphic study of the action currents
in nystagmus. Archives of Neurology & Psychiatry, 21(4), 901–918. doi:10.1001/
archneurpsyc.1929.02210220172009.
386 References
Michel, M., & Smith, B. (2017). Eye-tracking research in computer-mediated language

learning. In S. Thorne & S. May (Eds.), Language, education and technology. Encyclopedia
of language and education (3rd ed.) (pp. 453–464). Cham, Switzerland: Springer.
doi:10.1007/978-3-319-02237-6_34
Michel, M., & Smith, B. (2019). Measuring lexical alignment during L2 chat interaction:
An eye-tracking study. In S. Gass, P. Spinner, & J. Behney (Eds.), Salience in second language
acquisition (pp. 244–268). New York: Routledge.
Miles, W. R. (1930). Ocular dominance in human adults. Journal of General Psychology, 3,
412–430.
Mirman, D. (2014). Growth curve analysis and visualization using R. Boca Raton, FL: Chapman
and Hall/CRC Press.
Mirman, D., Dixon, J. A., & Magnuson, J. S. (2008). Statistical and computational models of
the visual world paradigm: Growth curves and individual differences. Journal of Memory
and Language, 59(4), 475–494. doi:10.1016/j.jml.2007.11.006
Mitchell, D. C. (2004). On-line methods in language processing: Introduction and historical
review. In M. Carreiras & C. J. Clifton (Eds.), The on-line study of sentence comprehension
(pp. 15–32). Brighton, UK: Routledge. doi:10.4324/9780203509050
*Mitsugi, S. (2017). Incremental comprehension of Japanese passives: Evidence from
the visual-world paradigm. Applied Psycholinguistics, 38(4), 953–983. doi:10.1017/
S0142716416000515
*Mitsugi, S., & MacWhinney, B. (2016). The use of case marking for predictive processing
in second language Japanese. Bilingualism: Language and Cognition, 19(1), 19–35.
doi:10.1017/S1366728914000881
*Miwa, K., Dijkstra, T., Bolger, P., & Baayen, R. H. (2014). Reading English with Japanese
in mind: Effects of frequency, phonology, and meaning in different-script bilinguals.
Bilingualism: Language and Cognition, 17(3), 445–463. doi:10.1017/S1366728913000576
Mohamed, A. A. (2015). The roles of context and repetition in incidental vocabulary
acquisition from L2 reading: An eye movement study (Doctoral dissertation). Retrieved
from ProQuest Dissertations and Theses Global. (3701105).
*Mohamed, A. A. (2018). Exposure frequency in L2 reading: An eye-movement perspective
of incidental vocabulary learning. Studies in Second Language Acquisition, 40(2), 269–293.
doi:10.1017/S0272263117000092
*Montero Perez, M., Peters, E., & DeSmet, P. (2015). Enhancing vocabulary learning
through captioned video: An eye-tracking study. The Modern Language Journal, 99(2),
308–328. doi:10.1111/modl.12215
Montero Perez, M., Van Den Noortgate, W., & Desmet, P. (2013). Captioned video
for L2 listening and vocabulary learning: A meta-analysis. System, 41(3), 720–739.
doi:10.1016/j.system.2013.07.013
*Morales, L., Paolieri, D., Dussias, P. E., Valdés Kroff, J. R., Gerfen, C., & Teresa Bajo, M.
(2016). The gender congruency effect during bilingual spoken-word recognition.
Morgan-Short, K., Faretta-Stutenberg, M., & Bartlett-Hsu, L. (2015). Contributions of
event-related potential research to issues in explicit and implicit second language
acquisition. In P. Rebuschat (Ed.), Implicit and explicit learning of languages (pp. 349–384).
Amsterdam, the Netherlands: Benjamins. doi:10.1075/sibil.48.15mor
Morgan-Short, K., Heil, J., Botero-Moriarty, A., & Ebert, S. (2012). Allocation of attention
to second language form and meaning. Studies in Second Language Acquisition, 34(4),
659–685. doi:10.1017/S027226311200037X
References 387
Morgan-Short, K., Sanz, C., Steinhauer, K., & Ullman, M. T. (2010). Second
language acquisition of gender agreement in explicit and implicit training
conditions: An event-related potential study. Language Learning, 60(1), 154–193.
doi:10.1111/j.1467-9922.2009.00554.x
Morgan-Short, K., Steinhauer, K., Sanz, C., & Ullman, M.T. (2012). Explicit and implicit
second language training differentially affect the achievement of native-like brain
activation patterns. Journal of Cognitive Neuroscience, 24(4), 933–947. doi:10.1162/jocn
Morgan-Short, K., & Tanner, D. (2014). Event-related potentials (ERPs). In J. Jegerski & B.
VanPatten (Eds.), Research methods in second language psycholinguistics (pp. 127–152). New
York: Routledge.
Morrison, R. E. (1984). Manipulation of stimulus onset delay in reading: Evidence for
parallel programming of saccades. Journal of Experimental Psychology: Human Perception
and Performance, 10(5), 667–682. doi:10.1037/0096-1523.10.5.667
Mueller, J. L. (2005). Electrophysiological correlates of second language processing. Second
Language Research, 21(2), 152–174. doi:10.1191/0267658305sr256oa
Mulvey, F., Pelz, J., Simpson, S., Cleveland, D.,Wang, D., Latorella, K., … Hayhoe, M. (2018,
March). How reliable is eye movement data? Results of large-scale system comparison
and universal standards for measuring and reporting eye data quality. Paper to the
Annual Meeting of the American Association of Applied Linguistics, Chicago, IL.
*Muñoz, C. (2017). The role of age and proficiency in subtitle reading. An eye-tracking
study. System, 67, 77–86. doi:10.1016/j.system.2017.04.015
Murray,W. S. (2000). Sentence processing: Issues and measures. In A. Kennedy, R. Radach, D.
Heller, & J. Pynte (Eds.), Reading as a perceptual process (pp. 649–664). Oxford, UK: Elsevier.
Murray, W. S., Fischer, M. H., & Tatler, B. W. (2013). Serial and parallel processes in eye
movement control: Current controversies and future directions. The Quarterly Journal of
Experimental Psychology, 66(3), 417–428. doi:10.1080/17470218.2012.759979
Nassaji, H. (2003). Higher–level and lower–level text processing skills in advanced
ESL reading comprehension. The Modern Language Journal, 87(2), 261–276.
doi:10.1111/1540-4781.00189
Nesselhauf, N. (2005). Collocations in a learner corpus. Amsterdam, the Netherlands: John
Benjamins.
Nieuwenhuis, R., te Grotenhuis, H. F., & Pelzer, B. J. (2012). influence.ME: Tools for
detecting influential data in mixed effects models. The R-Journal, 4(2), 38–47. Retrieved
from http://hdl.handle.net/2066/103101
Nyström, M., Andersson, R., Holmqvist, K., & Van De Weijer, J. (2013). The influence of
calibration method and eye physiology on eyetracking data quality. Behavior Research
Methods, 45(1), 272–288. doi:10.3758/s13428-012-0247-4
O’Regan, J. K., & Levy-Schoen, A. (1987). Eye movement strategy and tactics in word
recognition and reading. In M. Coltheart (Ed.), Attention and performance XII (pp. 363–
383). Hillsdale, NJ: CRC Press.
Open Science Collaboration. (2015). Estimating the reproducibility of psychological
science. Science, 349(6251). doi:10.1126/science.aac4716
Osterhout, L., McLaughlin, J., Kim, A., Greenwald, R., & Inoue, K. (2004). Sentences in
the brain: Event-related potentials as real-time reflections of sentence comprehension
and language learning. In M. Carreiras & C. Clifton (Eds.), The on-line study of sentence
comprehension: Eyetracking, ERPs, and beyond (pp. 271–308). New York: Psychology Press.
Paap, K. R. (2018). Bilingualism in cognitive science. In A. De Houwer & L. Ortega (Eds.),
The Cambridge handbook of bilingualism (pp. 435–465). Cambridge, MA: Cambridge
University Press. doi:10.1017/9781316831922.023
388 References
Paas, F., Renkl, A., & Sweller, J. (2004). Cognitive load theory: Instructional implications of
the interaction between information structures and cognitive architecture. Instructional
Science, 32, 1–8.
Panichi, M., Burr, D., Morrone, M. C., & Baldassi, S. (2012). Spatiotemporal dynamics of
perisaccadic remapping in humans revealed by classification images. Journal of Vision,
12(4), 11, 1–15. doi:10.1167/12.4.11
Papadopoulou, D., & Clahsen, H. (2003). Parsing strategies in L1 and L2 sentence processing.
Papadopoulou, D., Tsimpli, I., & Amvrazis, N. (2014). Self-paced listening. In J. Jegerski
& B. VanPatten (Eds.), Research methods in second language psycholinguistics (pp. 50–68).
London, UK: Taylor & Francis.
Papafragou, A., Hulbert, J., & Trueswell, J. (2008). Does language guide event perception?
Evidence from eye movements. Cognition, 108(1), 155–184. doi:10.1016/j.
Paterson, K. B., McGowan,V. A.,White, S. J., Malik, S., Abedipour, L., & Jordan,T. R. (2014).
Reading direction and the central perceptual span in Urdu and English. PLoS ONE,
9(2), e88358. doi:10.1371/journal.pone.0088358
Pawley, A., & Syder, F. H. (1983). Two puzzles for linguistic theory: nativelike selection and
nativelike fluency. In J. C. Richards & R. W. Schmidt (Eds.), Language and communication
(pp. 191–227). New York: Routledge.
*Pellicer-Sánchez, A. (2016). Incidental L2 vocabulary acquisition from and while reading:
An eye-tracking study. Studies in Second Language Acquisition, 38(1), 97–130. doi:10.1017/
S0272263115000224
Perfetti, C.A., Liu,Y., & Tan, L. H. (2005).The lexical constituency model: some implications
of research on Chinese for general theories of reading. Psychological Review, 112(1),
43–59. https://psycnet.apa.org/doi/10.1037/0033-295X.112.1.43
Peters, R. E., Grüter, T., & Borovsky, A. (2018). Vocabulary size and native speaker self-
identification influence flexibility in linguistic prediction among adult bilinguals.
*Philipp, A. M., & Huestegge, L. (2015). Language switching between sentences in reading:
Exogenous and endogenous effects on eye movements and comprehension. Bilingualism:
Phillips, C. (2006). The real-time status of island phenomena. Language, 82(4), 795–823.
doi:10.1353/lan.2006.0217
Pickering, M. J., & Garrod, S. (2013). An integrated theory of language production
and comprehension. Behavioral and Brain Sciences, 36(4), 329–347. doi:10.1017/
S0140525X12001495
Plonsky, L. (2013). Study quality in SLA: An assessment of designs, analyses, and reporting
practices in quantitative L2 research. Studies in Second Language Acquisition, 35(4), 655–
687. doi:10.1017/S0272263113000399
Plonsky, L. (2014). Study quality in quantitative L2 research (1990–2010):A methodological
synthesis and call for reform. The Modern Language Journal, 98(1), 450–470.
doi:10.1111/j.1540-4781.2014.12058.x
Plonsky, L., & Derrick, D. J. (2016). A meta-analysis of reliability coefficients in second
language research. The Modern Language Journal, 100(2), 538–553.
Plonsky, L., & Ghanbar, H. (2018). Multiple regression in L2 research: A methodological
synthesis and guide to interpreting R2 values. The Modern Language Journal, 102(4),
713–731. doi:10.1111/modl.12509
References 389
Plonksy, L., Marsden, E., Crowther, D., Gass, S. M., & Spinner, P. (2019). A methodological
synthesis and meta-analysis of judgment tasks in second language research. Second
Language Research. doi:10.1177/0267658319828413
Plonsky, L., & Oswald, F. L. (2014). How big is “big”? Interpreting effect sizes in L2 research.
Language Learning, 64(4), 878–912. doi:10.1111/lang.12079
Plonsky, L., & Oswald, F. L. (2017). Multiple regression as a flexible alternative to ANOVA
in L2 research. Studies in Second Language Acquisition, 39(3), 579–592. doi:10.1017/
S0272263116000231
Polio, C., & Gass, S. (1997). Replication and reporting: A commentary. Studies in Second
Language Acquisition, 19(4), 499–508. doi:10.1017/S027226319700404X
Pollatsek,A., Bolozky, S.,Well,A. D., & Rayner, K. (1981).Asymmetries in the perceptual span
for Israeli readers. Brain and Language, 14(1), 174–180. doi:10.1016/0093-934X(81)
90073-0
Pomplun, M., Reingold, E. M., & Shen, J. (2001). Investigating the visual span in comparative
search: The effects of task difficulty and divided attention. Cognition, 81(2), B57–B67.
doi:10.1016/S0010-0277(01)00123-8
Porte, G. (Ed.). (2012). Replication research in applied linguistics. Cambridge, MA: Cambridge
University Press.
Posner, M. A. (1980). Orienting of attention. Quarterly Journal of Experimental Psychology,
32(1), 3–25. doi:10.1080/00335558008248231
Posner, M. I., & Petersen, S. (1990). The attention system of the human brain. Annual
Review of Neuroscience, 13, 25–42. doi:10.1146/annurev.ne.13.030190.000325
Posner, M. I., Snyder, C. R. R., & Davidson, B. J. (1980). Attention and the
detection of signals. Journal of Experimental Psychology: General, 109(2), 160–174.
doi:10.1037/0096-3445.109.2.160
Potter, M. C. (1984). Rapid serial visual presentation (RSVP): A method for studying
language processing. In D. Kieras & M. Just (Eds.), New methods in reading comprehension
research (pp. 91–118). Hillsdale, NJ: Erlbaum.
*Pozzan, L., & Trueswell, J. C. (2016). Second language processing and revision of garden-
path sentences: A visual word study. Bilingualism: Language and Cognition, 19(3), 636–643.
doi:10.1017/S1366728915000838
Pressley, M., & Afflerbach, P. (1995). Verbal protocols of reading: The nature of constructively
responsive reading. Hillsdale, NJ: Lawrence Erlbaum.
Pynte, J., & Kennedy, A. (2006). An influence over eye movements in reading exerted
from beyond the level of the word: Evidence from reading English and French. Vision
Research, 46(22), 3786–3801. doi:10.1016/j.visres.2006.07.004
Pynte, J., New, B., & Kennedy, A. (2008). A multiple regression analysis of syntactic and
semantic influences in reading normal text. Journal of Eye Movement Research, 2(1), 1–11.
doi:10.16910/jemr.2.1.4
Qi, D. S., & Lapkin, S. (2001). Exploring the role of noticing in a three-stage second
language writing task. Journal of Second Language Writing, 10(4), 277–303. doi:10.1016/
S1060-3743(01)00046-7
Radach, R., Inhoff, A. W., Glover, L., & Vorstius, C. (2013). Contextual constraint and N
+ 2 preview effects in reading. The Quarterly Journal of Experimental Psychology, 66(3),
619–633. doi:10.1080/17470218.2012.761256
Radach, R., & Kennedy, A. (2004). Theoretical perspectives on eye movements in reading:
Past controversies, current issues, and an agenda for future research. European Journal of
Cognitive Psychology, 16, 3–26. doi:10.1080/09541440340000295
390 References
Radach, R., & Kennedy, A. (2013). Eye movements in reading: Some theoretical context.
The Quarterly Journal of Experimental Psychology, 66(3), 429–452. doi:10.1080/1747021
8.2012.750676
Radach, R., Reilly, R., & Inhoff, A. (2007). Models of oculomotor control in reading. In
R. P. G. van Gompel, M. H. Fischer, W. S. Murray, & R. L. Hill (Eds.), Eye movements:
A window on mind and brain (pp. 237–269). Amsterdam, the Netherlands: Elsevier.
doi:10.1016/B978-008044980-7/50013-6
Radach, R., Schmitten, C., Glover, L., & Huestegge, L. (2009). How children read
for comprehension: Eye movements in developing readers. In R. K. Wagner, C.
Schatschneider, & C. Phythian-Sence (Eds.), Beyond decoding:The biological and behavioral
foundations of reading comprehension (pp. 75–106). New York: Guildford Press.
Ratcliff, R. (1993). Methods for dealing with reaction time outliers. Psychological Bulletin,
114(3), 510–532. doi:10.1037/0033-2909.114.3.510
Rayner, K. (n.d.) Keith Rayner (1943–2015). Retrieved from http://www.forevermissed.
com/keith-rayner/#about
Rayner, K. (1975).The perceptual span and peripheral cues in reading. Cognitive Psychology,
7(1), 65–81. doi:10.1016/0010-0285(75)90005-5
Rayner, K. (1979). Eye guidance in reading: Fixation locations within words. Perception, 8,
21–30. doi:10.1068/p080021
Rayner, K. (1986). Eye movements and the perceptual span in beginning and skilled readers.
Journal of Experimental Child Psychology, 41(2), 211–236. doi:10.1016/0022-0965(86)
90037-8
Rayner, K. (1998). Eye movements in reading and information processing: 20 years of
research. Psychological Bulletin, 124(3), 372–422. doi:10.1037/0033-2909.124.3.372
Rayner, K. (2009). Eye movements and attention in reading, scene perception, and
visual search. The Quarterly Journal of Experimental Psychology, 62(8), 1457–1506.
doi:10.1080/17470210902816461
Rayner, K., Juhasz, B. J., & Pollatsek, A. (2005). Eye movements during reading. In M.
Snowling & C. Hulme (Eds.), The science of reading: A handbook (pp. 79–97). Oxford,
UK: Blackwell.
Rayner, K., & McConkie, G. W. (1976). What guides a reader’s eye movements? Vision
Research, 16(8), 829–837. doi:10.1016/0042-6989(76)90143-7
Rayner, K., & Morris, R. K. (1991). Comprehension processes in reading ambiguous
sentences: Reflections from eye movements. Advances in Psychology, 77, 175–198.
doi:10.1016/S0166-4115(08)61533-2
Rayner, K., & Pollatsek, A. (2006). Eye-movement control in reading. In M. J.Traxler & M.
A. Gernsbacher (Eds.), Handbook of psycholinguistics (pp. 613–657). New York: Elsevier.
doi:10.1016/B978-012369374-7/50017-1
Rayner, K., Pollatsek, A., Ashby, J., & Clifton Jr., C. (2012). Psychology of reading (2nd ed.).
Psychology Press.
Rayner, K., Pollatsek, A., Drieghe, D., Slattery, T. J., & Reichle, E. D. (2007). Tracking
the mind during reading via eye movements: Comments on Kliegl, Nuthmann,
and Engbert (2006). Journal of Experimental Psychology: General, 136(3), 520–529.
doi:10.1037/0096-3445.136.3.520
Rayner, K., Schotter, E. R., Masson, M. E., Potter, M. C., & Treiman, R. (2016). So much
to read, so little time: How do we read, and can speed reading help? Psychological Science
in the Public Interest, 17(1), 4–34. doi:10.1177%2F1529100615623267
References 391
Rayner, K., Warren, T., Juhasz, B. J., & Liversedge, S. P. (2004). The effect of plausibility
on eye movements in reading. Journal of Experimental Psychology: Learning, Memory, and
Cognition, 30(6), 1290–1301. doi:10.1037/0278-7393.30.6.1290
Rayner, K., & Well, A. D. (1996). Effects of contextual constraint on eye movements
in reading: A further examination. Psychonomic Bulletin & Review, 3(4), 504–509.
doi:10.3758/BF03214555
Rayner, K., Well, A. D., Pollatsek, A., & Bertera, J. H. (1982). The availability of useful
information to the right of fixation in reading. Perception and Psychophysics, 31(6), 537–
550. doi:10.3758/BF03204186
Rebuschat, P. (2008). Implicit learning of natural language syntax (Unpublished doctoral
dissertation). University of Cambridge. https://doi-org.proxy1.cl.msu.edu/10.17863/
CAM.15883
Rebuschat, P., & Williams, J. N. (2012). Implicit and explicit knowledge in second language
acquisition. Applied Psycholinguistics, 33(4), 829–856. doi:10.1017/S0142716411000580
Reichle, E. D. (2011). Serial-attention models of reading. In S. P. Liversedge, I. Gilchrist,
& S. Everling (Eds.), The Oxford handbook of eye movements (pp. 767–786). Oxford, UK:
Oxford University Press. doi:10.1093/oxfordhb/9780199539789.013.0042
Reichle, E. D., Liversedge, S. P., Drieghe, D., Blythe, H. I., Joseph, H. S. S. L., White, S. J.,
& Rayner, K. (2013). Using E-Z Reader to examine the concurrent development
of eye-movement control and reading skill. Developmental Review, 33(2), 110–149.
doi:10.1016/j.dr.2013.03.001
Reichle, E. D., Pollatsek, A., & Rayner, K. (2006). E–Z Reader: A cognitive-control, serial-
attention model of eye-movement behavior during reading. Cognitive Systems Research,
7(1), 4–22. doi:10.1016/j.cogsys.2005.07.002
Reichle, E. D., Pollatsek, A., & Rayner, K. (2012). Using E-Z Reader to simulate eye
movements in nonreading tasks: A unified framework for understanding the eye–mind
link. Psychological Review, 119(1), 155–185. doi:10.1037/a0026473
Reichle, E. D., Rayner, K., & Pollatsek, A. (1999). Eye movement control in reading:
accounting for initial fixation locations and refixations within the E-Z Reader model.
Vision Research, 39(26), 4403–4411. doi:10.1016/S0042-6989(99)00152-2
Reichle, E. D., Rayner, K., & Pollatsek, A. (2003).The E-Z Reader model of eye-movement
control in reading: Comparisons to other models. Behavioral and Brain Sciences, 26,
445–526. doi:10.1017/S0140525X03000104
Reichle, E. D.,Warren,T., & McConnell, K. (2009). Using E-Z Reader to model the effects
of higher level language processing on eye movements during reading. Psychonomic
Bulletin & Review, 16(1), 1–21. doi:10.3758/PBR.16.1.1
Reilly, R. G., & Radach, R. (2006). Some empirical tests of an interactive activation
model of eye movement control in reading. Cognitive Systems Research, 7(1), 34–55.
doi:10.1016/j.cogsys.2005.07.006
Révész, A. (2014). Towards a fuller assessment of cognitive models of task-based learning:
Investigating task-generated cognitive demands and processes. Applied Linguistics, 35(1),
87–92. doi:10.1093/applin/amt039
Révész, A., Michel, M., & Lee, M. (2019). Exploring second language writers’ pausing and
revision behaviors: A mixed-methods study. Studies in Second Language Acquisition, 41(3),
605–631. doi: 10.1017/S027226311900024X
*Révész, A., Sachs, R., & Hama, M. (2014). The effects of task complexity and input
frequency on the acquisition of the past counterfactual construction through recasts.
Language Learning, 64(3), 615–650. doi:10.1111/lang.12061
392 References
Richard, F. D., Bond, C. F., & Stokes-Zoota, J. J. (2003). One hundred years of social
psychology quantitatively described. Review of General Psychology, 7(4), 331–363.
doi:10.1037/1089-2680.7.4.331
Rizzolatti, G., Riggio, L., Dascola, I., & Umiltá, C. (1987). Reorienting attention across the
horizontal and vertical meridians: Evidence in favor of a premotor theory of attention.
Neuropsychologia, 25(1), 31–40. doi:10.1016/0028-3932(87)90041-8
Rizzolatti, G., Riggio, L., & Sheliga, B. M. (1994). Space and selective attention. In C.
Umiltá & M. Moscovitch (Eds.), Attention and performance XV: Conscious and nonconscious
information processing (Vol. 15, pp. 232–265). Cambridge, MA: The MIT Press.
doi:10.1016/j.cub.2010.11.004
*Roberts, L., Gullberg, M., & Indefrey, P. (2008). Online pronoun resolution in L2
discourse: L1 influence and general learner effects. Studies in Second Language Acquisition,
30(3), 333–357. doi:10.1017/S0272263108080480
Roberts, L., & Siyanova-Chanturia, A. (2013). Using eye-tracking to investigate topics in
L2 acquisition and L2 processing. Studies in Second Language Acquisition, 35(2), 213–235.
doi:10.1017/S0272263112000861
Robinson, P. (2001). Task complexity, task difficulty, and task production: Exploring
interactions in a componential framework. Applied Linguistics, 22(1), 27–57. doi:10.1093/
applin/22.1.27
Robinson, P. (2011). Task-based language learning: A review of issues. Language Learning,
61(s1), 1–36. doi:10.1111/j.1467-9922.2011.00641.x
Robinson, P., & Ellis, N. C. (Eds.). (2008). Handbook of cognitive linguistics and second language
acquisition. New York: Routledge.
*Rodríguez, D. D. L., Buetler, K. A., Eggenberger, N., Preisig, B. C., Schumacher, R.,
Laganaro, M., … Müri, R. M. (2016).The modulation of reading strategies by language
opacity in early bilinguals: An eye movement study. Bilingualism: Language and Cognition,
19(3), 567–577. doi:10.1017/S1366728915000310
Rosa, E., & Leow, R. P. (2004). Awareness, different learning conditions, and second
language development. Applied Psycholinguistics, 25(2), 269–292. doi:10.1017/
S0142716404001134
Rosa, E., & O’Neill, M. D. (1999). Explicitness, intake, and the issue of awareness. Studies in
Rossion, B., & Pourtois, G. (2004). Revisiting Snodgrass and Vanderwart’s object pictorial
set: The role of surface detail in basic-level object recognition. Perception, 33(2), 217–
236. doi:10.1068/p5117
Rothman, J. (2009). Understanding the nature and outcomes of early bilingualism:
Romance languages as heritage languages. International Journal of Bilingualism, 13(2),
155–163. doi:10.1177/1367006909339814
Rubin, J. (1995).The contribution of video to the development of competence in listening.
In D. J. Mendelsohn & J. Rubin (Eds.), A guide for the teaching of second language listening
(pp. 151–165). San Diego, CA: Dominie Press.
Runner, J. T., Sussman, R. S., & Tanenhaus, M. K. (2003). Assignment of reference to
reflexives and pronouns in picture noun phrases: Evidence from eye movements.
Cognition, 89(1), B1–B13. doi:10.1016/S0010-0277(03)00065-9
Runner, J. T., Sussman, R. S., & Tanenhaus, M. K. (2006). Processing reflexives and
pronouns in picture noun phrases. Cognitive Science, 30(2), 193–241. doi:10.1207/
s15516709cog0000_58
References 393
Sachs, R., & Polio, C. (2007). Learners’ uses of two types of written feedback on a L2
writing revision task. Studies in Second Language Acquisition, 29(1), 67–100. doi:10.1017/
S0272263107070039
Sachs, R., & Suh, B. R. (2007). Textually enhanced recasts, learner awareness, and L2
outcomes in synchronous computer-mediated interaction. In A. Mackey (Ed.),
Conversational interaction in second language acquisition: A collection of empirical studies (pp.
197–227). Oxford, UK: Oxford University Press.
*Sagarra, N., & Ellis, N. C. (2013). From seeing adverbs to seeing verbal morphology:
Language experience and adult acquisition of L2 tense. Studies in Second Language
Salverda, A. P., Brown, M., & Tanenhaus, M. K. (2011). A goal-based perspective on eye
movements in visual world studies. Acta Psychologica, 137(2), 172–180. doi:10.1016/j.
actpsy.2010.09.010
Salvucci, D. D. (2001). An integrated model of eye movements and visual encoding.
Cognitive Systems Research, 1(4), 201–220. doi:10.1016/S1389-0417(00)00015-2
Sanz, C., Morales-Front, A., Zalbidea, J, & Zárate-Sández, G. (2016). Always in motion
the future is: Doctoral students’ use of technology for SLA research. In R. P. Leow, L.
Cerezo, & M. Baralt (Eds.), A psycholinguistic approach to technology and language learning
(pp. 49–68). Berlin, Germany: De Gruyter Mouton.
Saslow, M. G. (1967). Effects of components of displacement-step stimuli upon latency for
saccadic eye movement. Journal of the Optical Society of America, 57(8), 1024. doi:10.1364/
JOSA.57.001024
Schmidt, R. (1990). The role of consciousness in second language learning. Applied
Linguistics, 11(2), 129–158. doi:10.1093/applin/11.2.129
Schmidt, R., & Frota, S. (1986). Developing basic conversational ability in a second
language: A case study of an adult learner of Portuguese. In R. Day (Ed.), Talking
to learn: Conversation in second language acquisition (pp. 237–326). Rowley, MA:
Newbury.
Schmitt, N. (2010). Researching vocabulary: A vocabulary research manual. New York: Springer.
Schott, E. (1922). Über die Registrierung des Nystagmus und anderer Augenbewegungen
vermittels des Seitengalvenometers. Deutsches Archiv Für Klinische Medizin, 140, 79–90.
Sedivy, J. C. (2003). Pragmatic versus form-based accounts of referential contrast: Evidence
for effects of informativity expectations. Journal of Psycholinguistic Research, 32(1), 3–23.
doi:10.1023/A:1021928914454
Sedivy, J. C., Tanenhaus, M., Chambers, C. G., & Carlson, G. N. (1999). Achieving
incremental semantic interpretation through contextual representation. Cognition,
71(2), 109–147. doi:10.1016/S0010-0277(99)00025-6
Segalowitz, N. (2010). Cognitive bases of second language fluency. New York: Routledge.
*Sekerina, I. A., & Sauermann, A. (2015). Visual attention and quantifier-spreading
in heritage Russian bilinguals. Second Language Research, 31(1), 75–104.
doi:10.1177/0267658314537292
*Sekerina, I. A., & Trueswell, J. C. (2011). Processing of contrastiveness by heritage
Russian bilinguals. Bilingualism: Language and Cognition, 14(3), 280–300. doi:10.1017/
S1366728910000337
Sereno, S. C., & Rayner, K. (2003). Measuring word recognition in reading: Eye movements
and event-related potentials. Trends in Cognitive Sciences, 7(11), 489–493. doi:10.1016/j.
tics.2003.09.010
394 References
Sereno, S. C., Rayner, K., & Posner, M. I. (1998). Establishing a time-line of word
recognition: Evidence from eye movements and event-related potentials. Cognitive
Neuroscience, 9(10), 2195–2200. doi:10.1097/00001756-199807130-00009
Severens, E.,Van Lommel, S., Ratinckx, E., & Hartsuiker, R. J. (2005).Timed picture naming
norms for 590 pictures in Dutch. Acta Psychologica, 119(2), 159–187. doi:10.1016/j.
actpsy.2005.01.002
Sheliga, B. M., Craighero, L., Riggio, L., & Rizzolatti, G. (1997). Effects of spatial attention
on directional manual and ocular responses. Experimental Brain Research, 114(2), 339–
351. doi:10.1007/PL00005642
Shen, D., Liversedge, S. P., Tian, J., Zang, C., Cui, L., Bai, X., … Rayner, K. (2012). Eye
movements of second language learners when reading spaced and unspaced Chinese
text. Journal of Experimental Psychology: Applied, 18(2), 192–202. doi:10.1037/a0027485
Shen, H. H. (2014). Chinese L2 literacy debates and beginner reading in the United States.
In M. Bigelow & J. Ennser-Kananen (Eds.), Routledge handbook of educational linguistics
(pp. 276–288). New York: Routledge.
*Shintani, N., & Ellis, R. (2013). The comparative effect of direct written corrective
feedback and metalinguistic explanation on learners’ explicit and implicit knowledge
of the English indefinite article. Journal of Second Language Writing, 22(3), 286–306.
doi:10.1016/j.jslw.2013.03.011
Simard, D., & Wong,W. (2001). Alertness, orientation, and detection:The conceptualization
of attentional functions in SLA. Studies in Second Language Acquisition, 23(1), 103–124.
doi:10.1017/S0272263101001048
Sinclair, J. (1991). Corpus, concordance, collocation. Oxford, UK: Oxford University Press.
*Singh, N., & Mishra, R. K. (2012). Does language proficiency modulate oculomotor
control? Evidence from Hindi–English bilinguals. Bilingualism: Language and Cognition,
15(4), 771–781. doi:10.1017/S1366728912000065
*Siyanova-Chanturia, A., Conklin, K., & Schmitt, N. (2011). Adding more fuel to the fire:
An eye-tracking study of idiom processing by native and non-native speakers. Second
Siyanova-Chanturia, A., Conklin, K., & van Heuven, W. J. B. (2011). Seeing a phrase “time
and again” matters: The role of phrasal frequency in the processing of multiword
sequences. Journal of Experimental Psychology: Learning, Memory, and Cognition, 37(3),
776–784. doi:10.1037/a0022531
Skehan, P. (1998). Task-based instruction. Annual Review of Applied Linguistics, 18, 268–286.
doi:10.1017/S0267190500003585
Skehan, P. (2009). Modelling second language performance: Integrating complexity,
accuracy, fluency, and lexis. Applied Linguistics, 30(4), 510–532. doi:10.1093/applin/
amp047
Smith, B. (2005). The relationship between negotiated interaction, learner uptake, and
lexical acquisition in task-based computer-mediated communication. TESOL Quarterly,
39(1), 33–58. doi:10.2307/3588451
Snijders,T. A. B., & Bosker, R. J. (2012). Multilevel analysis: An introduction to basic and advanced
multilevel modeling (2nd ed.). London, UK: Sage.
Snodgrass, J. G., &Vanderwart, M. (1980).A standardized set of 260 pictures: Norms for name
agreement, image agreement, familiarity, and visual complexity. Journal of Experimental
Psychology: Human Learning and Memory, 6(2), 174–215. doi:10.1037/0278-7393.6.2.174
*Sonbul, S. (2015). Fatal mistake, awful mistake, or extreme mistake? Frequency effects on
off-line/on-line collocational processing. Bilingualism: Language and Cognition, 18(3),
419–437. doi:10.1017/S1366728914000674
References 395
Song, M.-J., & Suh, B.-R. (2008).The effects of output task types on noticing and learning
of the English past counterfactual conditional. System, 36(2), 295–312. doi:10.1016/j.
system.2007.09.006
Sorace, A. (2005). Syntactic optionality at interfaces. In L. Cornips & K. Corrigan (Eds.),
Syntax and variation: Reconciling the biological and the social (pp. 46–111). Amsterdam, the
Netherlands: John Benjamins.
Sorace, A. (2011). Pinning down the concept of “interface” in bilingualism. Linguistic
Approaches to Bilingualism, 1(1), 1–33. doi:10.1075/lab.1.1.01sor
Spector, R. H. (1990).Visual fields. In H. K. Walker, W. D. Hall, & J. W. Hurst (Eds.), Clinical
methods: The history, physical, and laboratory examinations (3rd ed.) (pp. 565–572). Boston,
MA: Butterworths.
Spinner, P., & Gass, S. M. (2019). Using judgments in second language acquisition research.
*Spinner, P., Gass, S. M., & Behney, J. (2013). Ecological validity in eye-tracking. Studies in
Spivey, M. J., & Marian, V. (1999). Cross talk between native and second languages:
Partial activation of an irrelevant lexicon. Psychological Science, 10(3), 281–284.
doi:10.1111/1467-9280.00151
Spivey, M., & Cardon, C. (2015). Methods for studying adult bilingualism. In J.W. Schwieter
(Ed.), The Cambridge handbook of bilingual processing (pp. 108–132). Cambridge, UK:
Cambridge University Press. doi:10.1017/CBO9781107447257.004
Springob, C. (2015). Why is it easier to see a star if you look slightly to the side? Ask an
astronomer. Retrieved from http://curious.astro.cornell.edu/physics/81-the-universe/
stars-and-star-clusters/stargazing/373-why-is-it-easier-to-see-a-star-if-you-look-
slightly-to-the-side-intermediate
SR Research. (2017). EyeLink Data Viewer 3.1.97 (Computer Software). Mississauga, ON:
SR Research Ltd.
Starr, M. S., & Rayner, K. (2001). Eye movements during reading: Some current controversies.
Trends in Cognitive Sciences, 5(4), 156–163. doi:10.1016/S1364-6613(00)01619-3
Steinhauer, K. (2014). Event-related potentials (ERPs) in second language research: A brief
introduction to the technique, a selected review, and an invitation to reconsider critical
periods in L2. Applied Linguistics, 35(4), 393–417. doi:10.1093/applin/amu028
Steinhauer, K., & Drury, J. E. (2012). On the early left-anterior negativity (ELAN) in syntax
studies. Brain and Language, 120(2), 135–162. doi:10.1016/j.bandl.2011.07.001
Stephane, A. L. (2011). Eye tracking from a human factors perspective. In G. A. Boy (Ed.),
The handbook of human-machine interaction: A human-centered design approach. (pp. 339–364)
New York: CRC Press.
Stevenson, H. W., Lee, S.-Y., Chen, C., Stigler, J. W., Hsu, C.-C., Kitamura, S., & Hatano,
G. (1990). Contexts of achievement: A study of American, Chinese, and Japanese
children. Monographs of the Society for Research in Child Development, 55(1/2), i-119.
doi:10.2307/1166090
Stroop, J. R. (1935). Studies of interference in serial verbal reactions. Journal of Experimental
Psychology, 18, 643–662. doi:10.1037/h0054651
Styles, E. A. (2006). The psychology of attention. New York: Psychology Press.
Sussman, R. S. (2006). Processing and representation of verbs: Insights from instruments
(Unpublished doctoral dissertation). University of Rochester.
*Suvorov, R. (2015). The use of eye tracking in research on video-based second language
(L2) listening assessment: A comparison of context videos and content videos. Language
Testing, 32(4), 463–483. doi:10.1177%2F0265532214562099
396 References
*Suzuki,Y. (2017).Validity of new measures of implicit knowledge: Distinguishing implicit

knowledge from automatized explicit knowledge. Applied Psycholinguistics, 38(5), 1229–
1261. doi:10.1017/S014271641700011X
*Suzuki,Y., & DeKeyser, R. (2017). The interface of explicit and implicit knowledge in a
second language: Insights from individual differences in cognitive aptitudes. Language
Learning, 67(4), 747–790. doi:10.1111/lang.12241
Swain, M. (1985). Communicative competence: Some roles of comprehensible input and
comprehensible output in its development. In S. M. Gass & C. G. Madden (Eds.), Input
in second language acquisition (pp. 235–253). Rowley, MA: Newbury House.
Swain, M., & Lapkin, S. (1995). Problems in output and the cognitive processes they
generate: A step towards second language learning. Applied Linguistics, 16(3), 371–391.
doi:10.1093/applin/16.3.371
Szekely,A., D’Amico, S., Devescovi,A., Federmeier, K., Herron, D., Iyer, G., … Bates, E. (2003).
Timed picture naming: Extended norms and validation against previous studies. Behavior
Research Methods, Instruments, and Computers, 35(4), 621–633. doi:10.3758/BF03195542
Szekely, A., Jacobsen, T., D’Amico, S., Devescovi, A., Andonova, E., Herron, D., … Bates,
E. (2004). A new on-line resource for psycholinguistic studies. Journal of Memory and
Language, 51(2), 247–250. doi:10.1016/j.jml.2004.03.002
Szekely,A., D’Amico, S., Devescovi,A., Federmeier, K., Herron, D., … Bates, E. (2005).Timed
action and object naming. Cortex, 41, 7–25. doi:10.1016/S0010-9452(08)70174-6
Tamim, R. M., Bernard, R. M., Borokhovski, E., Abrami, P. C., & Schmid, R. F. (2011).
What forty years of research says about the impact of technology on learning. Review of
Educational Research, 81(1), 4–28. doi:10.3102/0034654310393361
Tanenhaus, M. K., Magnuson, J. S., Dahan, D., & Chambers, C. (2000). Eye movements
and lexical access in spoken-language comprehension: Evaluating a linking hypothesis
between fixations and linguistic processing. Journal of Psycholinguistic Research, 29(6),
557–580. doi:10.1023/A:1026464108329
Tanenhaus, M. K., Spivey-Knowlton, M. J., Eberhard, K. M., & Sedivy, J. C. (1995).
Integration of visual and linguistic information in spoken language comprehension.
Science, 268(5217), 1632–1634. doi:10.1126/science.7777863
Tanenhaus, M. K., & Trueswell, J. C. (1995). Sentence comprehension. In J. L. Miller &
P. D. Eimas (Eds.), Speech, language, and communication: A volume in handbook of perception
and cognition (2nd ed.) (pp. 217–262). San Diego, CA: Academic Press. doi:10.1016/
B978-012497770-9.50009-1
Tanenhaus, M. K., & Trueswell, J. C. (2006). Eye movements and spoken language
comprehension. In M. Traxler & M. Gernsbacher (Eds.), Handbook of psycholinguistics
(pp. 863–900). Oxford, UK: Elsevier.
Taylor, J. N., & Perfetti, C.A. (2016). Eye movements reveal readers’ lexical quality and reading
experience. Reading and Writing, 29(6), 1069–1103. doi:10.1007/s11145-015-9616-6
Taylor, W. L. (1953). “Cloze procedure”: A new tool for measuring readability. Journalism
Quarterly, 30(4), 415–433. doi:10.1177%2F107769905303000401
Thiele, A., Henning, P., Kubischik, M., & Hoffmann, K. P. (2002). Neural mechanisms of
saccadic suppression. Science, 295(5564), 2460–2462. doi:10.1126/science.1068788
Thilo, K.V, Santoro, L., Walsh,V., & Blakemore, C. (2004). The site of saccadic suppression.
Nature Neuroscience, 7(1), 13–14. doi:10.1038/nn1171
Tinker, M. A. (1936). Reliability and validity of eye-movement measures of reading. Journal
of Experimental Psychology, 19(6), 732–746. doi:10.1037/H0060561
References 397
Tokowicz, N., & MacWhinney, B. (2005). Implicit and explicit measures of sensitivity to
violations in second language grammar: An event-related potential investigation. Studies
in Second Language Acquisition, 27(2), 173–204. doi:10.1017/S0272263105050102
Tomlin, R. S., & Villa, V. (1994). Attention in cognitive science and second language
acquisition. Studies in Second Language Acquisition, 16(2), 183–203. doi:10.1017/
S0272263100012870
*Tremblay, A. (2011). Learning to parse liaison-initial words: An eye-tracking study.
*Trenkic, D., Mirković, J., & Altmann, G.T. (2014). Real-time grammar processing by native
and non-native speakers: Constructions unique to the second language. Bilingualism:
Trueswell, J. C., Sekerina, I., Hill, N. M., & Logrip, M. L. (1999). The kindergarten-path
effect: studying on-line sentence processing in young children. Cognition, 73(2), 89–134.
doi:10.1016/S0010-0277(99)00032-3
Trueswell, J. C., Tanenhaus, M. K., & Kello, C. (1993). Verb-specific constraints in
sentence processing: Separating effects of lexical preference from garden-paths.
Journal of Experimental Psychology: Learning, Memory, and Cognition, 19(3), 528–553.
doi:10.1037/0278-7393.19.3.528
Tsai, J. L., & McConkie, G. W. (2003). Where do Chinese readers send their eyes. In R.
Radach, J. Hyona, & H. Deubel (Eds.), The mind’s eye: Cognitive and applied aspects of eye
movement research (pp. 159–176). New York: Elsevier.
Ullman, M.T. (2005). A cognitive neuroscience perspective on second language acquisition:
The declarative/procedural model. In C. Sanz (Ed.), Mind and context in adult second
language acquisition: Methods, theory and practice (pp. 141–178). Washington, DC:
Georgetown University Press.
Vainio, S., Hyönä, J., & Pajunen, A. (2009). Lexical predictability exerts robust effects on
fixation duration, but not on initial landing position during reading. Experimental
Psychology, 56(1), 66–74. doi:10.1027/1618-3169.56.1.66
*Vainio, S., Pajunen, A., & Hyönä, J. (2016). Processing modifier–head agreement in
L1 and L2 Finnish: An eye-tracking study. Second Language Research, 32(1), 3–24.
doi:10.1177/0267658315592201
Valian, V. (2015). Bilingualism and cognition. Bilingualism: Language and Cognition, 18(1),
3–24. doi:10.1017/S1366728914000522
Van Assche, E., Drieghe, D., Duyck, W., Welvaert, M., & Hartsuiker, R. J. (2011). The
influence of semantic constraints on bilingual word recognition during sentence
reading. Journal of Memory and Language, 64(1), 88–107. doi:10.1016/j.jml.2010.08.006
*Van Assche, E., Duyck, W., & Brysbaert, M. (2013). Verb processing by bilinguals in
sentence contexts:The effect of cognate status and verb tense. Studies in Second Language
Vanderplank, R. (2016). Captioned media in foreign language learning and teaching: Subtitles for
the deaf and hard-of-hearing as tools for language learning. London, UK: Palgrave Macmillan.
doi:10.1057/978-1-137-50045-8
Van Hell, J. G., & Tanner, D. (2012). Second language proficiency and cross-language lexical
activation. Language Learning, 62(s2), 148–171. doi:10.1111/j.1467-9922.2012.00710.x
Van Hell, J. G., & Tokowicz, N. (2010). Event-related brain potentials and second language
learning: Syntactic processing in late L2 learners at different L2 proficiency levels.
Second Language Research, 26(1), 43–74. doi:10.1177/0267658309337637
398 References
Van Merriënboer, J. J., & Sweller, J. (2005). Cognitive load theory and complex learning:
Recent developments and future directions. Educational Psychology Review, 17(2), 147–
177. doi:10.1007/s10648-005-3951-0
VanPatten, B., & Williams, J. (2002). Research criteria for tenure in second language
acquisition: Results from a survey of the field (Unpublished manuscript). University of
Illinois at Chicago.
Van Wermeskerken, M., Litchfield, D., & van Gog, T. (2018). What am I looking at?
Interpreting dynamic and static gaze displays. Cognitive Science, 42(1), 220–252.
doi:10.1111/cogs.12484
Veldre, A., & Andrews, S. (2014). Lexical quality and eye movements: Individual differences
in the perceptual span of skilled adult readers. The Quarterly Journal of Experimental
Psychology, 67(4), 703–727. doi:10.1080/17470218.2013.826258
Veldre, A., & Andrews, S. (2015). Parafoveal lexical activation depends on skilled reading
proficiency. Journal of Experimental Psychology: Learning, Memory, and Cognition, 41(2),
586–595. doi:10.1037/xlm0000039
Vilaró, A., Duchowski, A. T., Orero, P., Grindinger, T., Tetreault, S., & di Giovanni,
E. (2012). How sound is the Pear Tree Story? Testing the effect of varying audio
stimuli on visual attention distribution. Perspectives, 20(1), 55–65. doi:10.1080/0907
676X.2011.632682
Vitu, F. (1991). The existence of a center of gravity effect during reading. Vision Research,
31(7–8), 1289–1313. doi:10.1016/0042-6989(91)90052-7
Vitu, F., O’Regan, J. K., Inhoff, A. W., & Topolski, R. (1995). Mindless reading: Eye-
movement characteristics are similar in scanning letter strings and reading texts.
Perception and Psychophysics, 57(3), 352–364. doi:10.3758/BF03213060
Vitu, F., O’Regan, J. K., & Mittau, M. (1990). Optimal landing position in reading isolated
words and continuous text. Perception and Psychophysics, 47(6), 583–600. doi:10.3758/
BF03203111
Von der Malsburg, T., & Angele, B. (2017). False positives and other statistical errors in
standard analyses of eye movements in reading. Journal of Memory and Language, 94,
119–133. doi:10.1016/j.jml.2016.10.003
Von der Malsburg, T., & Vasishth, S. (2011). What is the scanpath signature of syntactic
reanalysis? Journal of Memory and Language, 65(2), 109–127. doi:10.1016/j.
jml.2011.02.004
Vonk, W., & Cozijn, R. (2003). On the treatment of saccades and regressions in eye
movement measures of reading time. In J. Hyönä, R. Radach, & H. Deubel (Eds.), The
mind’s eye: Cognitive and applied aspects of eye movement research (pp. 291–312). Amsterdam,
the Netherlands: North-Holland.
Wade, N. J. (2007). Scanning the seen: Vision and the origins of eye movement research.
In R. P. G. Van Gompel, M. H. Fischer, W. S. Murray, & R. L. Hill (Eds.), Eye
movements: A window on mind and brain (pp. 31–63). Oxford, UK: Elsevier. doi:10.1016/
B978-008044980-7/50004-5
Wade, N. J., & Tatler, B.W. (2005). The moving tablet of the eye. Oxford, UK: Oxford University
Press. doi:10.1093/acprof:oso/9780198566175.001.0001
Wagner, E. (2008). Video listening tests: What are they measuring? Language Assessment
Quarterly, 5(3), 218–243. doi:10.1080/15434300802213015
Wang, D., Mulvey, F. B., Pelz, J. B., & Holmqvist, K. (2017). A study of artificial eyes for the
measurement of precision in eye-trackers. Behavior Research Methods, 49(3), 947–959.
doi:10.3758/s13428-016-0755-8
References 399
Wang, M., Perfetti, C. A., & Liu,Y. (2003). Alphabetic readers quickly acquire orthographic
structure in learning to read Chinese. Scientific Studies of Reading, 7(2), 183–208.
doi:10.1207/S1532799XSSR0702_4
Wedel, M., & Pieters, R. (2008). Eye tracking for visual marketing. Foundations and Trends
in Marketing, 1(4), 231–320. doi:10.1561/1700000011
Wengelin, A., Torrance, M., Holmqvist, K., Simpson, S., Galbraith, D., Johansson, V., &
Johansson, R. (2009). Combined eyetracking and keystroke-logging methods for
studying cognitive processes in text production. Behavior Research Methods, 41(2), 337–
351. doi:10.3758/BRM.41.2.337
Whelan, R. (2008). Effective analysis of reaction time data. The Psychological Record, 58(3),
475–482. doi:10.1007/BF03395630
White, S. J. (2008). Eye movement control during reading: Effects of word frequency
and orthographic familiarity. Journal of Experimental Psychology: Human Perception and
Performance, 34(1), 205–223. doi:10.1037/0096-1523.34.1.205
Whitford,V., & Titone, D. (2015). Second-language experience modulates eye movements
during first- and second-language sentence reading: Evidence from a gaze-contingent
moving window paradigm. Journal of Experimental Psychology: Learning, Memory, and
Cognition, 41(4), 1118–1129. doi:10.1037/xlm0000093
Wicha, N. Y. Y., Moreno, E. M., & Kutas, M. (2004). Anticipating words and their gender:
An event-related brain potential study of semantic integration, gender expectancy, and
gender agreement in Spanish sentence reading. Journal of Cognitive Neuroscience, 16(7),
1272–1288. doi:10.1162/0898929041920487
Wilcox, R. (2012). Introduction to robust estimation and hypothesis testing (3rd ed.). Amsterdam,
the Netherlands: Elsevier.
Williams, J. N. (2009). Implicit learning in second language acquisition. In W. C. Ritchie &
T. K. Bhatia (Eds.), The new handbook of second language acquisition (pp. 319–353). Bingley,
UK: Emerald Publishing.
Williams, R., & Morris, R. (2004). Eye movements, word familiarity, and vocabulary
acquisition. European Journal of Cognitive Psychology, 16(1–2), 312–339.
doi:10.1080/09541440340000196
Wilson, M. P., & Garnsey, S. M. (2009). Making simple sentences hard: Verb bias effects
in simple direct object sentences. Journal of Memory and Language, 60(3), 368–392.
doi:10.1016/j.jml.2008.09.005
*Winke, P. (2013). The effects of input enhancement on grammar learning and
comprehension. Studies in Second Language Acquisition, 35(2), 323–352. doi:10.1017/
S0272263112000903
*Winke, P., Gass, S., & Sydorenko, T. (2013). Factors influencing the use of captions by
foreign language learners: An eye-tracking study. The Modern Language Journal, 97(1),
254–275. doi:10.1111/j.1540-4781.2013.01432.x
Winke, P., Godfroid, A., & Gass, S. M. (2013). Introduction to the special issue. Studies in
Second Language Acquisition, 35(2), 205–212. doi:10.1017/S027226311200085X
Winskel, H., Radach, R., & Luksaneeyanawin, S. (2009). Eye movements when reading
spaced and unspaced Thai and English: A comparison of Thai–English bilinguals and
English monolinguals. Journal of Memory and Language, 61(3), 339–351. doi:10.1016/j.
jml.2009.07.002
Wray, A. (2002). Formulaic language and the lexicon. Cambridge, MA: Cambridge University
Press.
Wray, A. (2008). Formulaic language: Pushing the boundaries. Oxford, UK: Oxford University
Press.
400 References
Wright, R. D., & Ward, L. M. (2008). Orienting of attention. New York: Oxford University
Press.
Yan, M., Kliegl, R., Richter, E. M., Nuthmann, A., & Shu, H. (2010). Flexible saccade-
target selection in Chinese reading. The Quarterly Journal of Experimental Psychology,
63(4), 705–725. doi:10.1080/17470210903114858
Yang, H. M., & McConkie, G. W. (1999). Reading Chinese: Some basic eye-movement
characteristics. In J. Wang, H. C. Chen, R. Radach, & A. Inhoff (Eds.), Reading Chinese
script: A cognitive analysis (pp. 207–222). London, UK: Lawrence Erlbaum.
Yang, S. (2006). An oculomotor-based model of eye movements in reading: The
competition/interaction model. Cognitive Systems Research, 7(1), 56–69. doi:10.1016/j.
cogsys.2005.07.005
Yatabe, K., Pickering, M. J., & McDonald, S. A. (2009). Lexical processing during saccades
in text comprehension. Psychonomic Bulletin & Review, 16(1), 62–66. doi:10.3758/
PBR.16.1.62
*Yi, W., Lu, S., & Ma, G. (2017). Frequency, contingency and online processing of
multiword sequences: An eye-tracking study. Second Language Research, 33(4), 519–549.
doi:10.1177/0267658317708009
Young, L. R., & Sheena, D. (1975). Survey of eye movement recording methods. Behavior
Research Methods & Instrumentation, 7(5), 397–429. doi:10.3758/BF03201553
Zang, C., Liang, F., Bai, X.,Yan, G., & Liversedge, S. P. (2013). Interword spacing and landing
position effects during Chinese reading in children and adults. Journal of Experimental
Psychology: Human Perception and Performance, 39(3), 720–734. doi:10.1037/a0030097
Zhu, Z., & Ji, Q. (2007). Novel eye gaze tracking techniques under natural head movement.
IEEE Transactions on Biomedical Engineering, 54(12), 2246–2260. doi:10.1109/
TBME.2007.895750
Zlatev, J., & Blomberg, J. (2015). Language may indeed influence thought. Frontiers in
Psychology, 6, 1631. doi:10.3389/fpsyg.2015.01631
*Zufferey, S., Mak,W., Degand, L., & Sanders,T. (2015). Advanced learners’ comprehension
of discourse connectives:The role of L1 transfer across on-line and off-line tasks. Second
INDEX OF NAMES
Italic page numbers indicate figures. Bold page numbers indicate tables.
Aaronson, D. 6 Bertera, J. H. 38
Alanen, R. 3 Bialystok, E. 102
Allopenna, P. D. 11, 89, 91, 92, 94, 96, 100 Binda, P. 34–35
Alsadoon, R. 71, 77, 138, 141–142, 220, Bisson, M. 63, 80–82, 134, 211, 213, 226
222, 225 Blumenfeld, H. K. 91, 100
Altmann, G. T. M. 49, 90–91, 93, Blythe, H. I. 34, 341
94–95, 96, 105, 107, 144, 168, 182, Boers, F. 73, 75
191–192, 347 Boersma, P. 198
Andersson, R. 229, 324–326 Bojko, A. 237
Andringa, S. 78, 98, 112–114, 116, 124, Bolger, P. 64, 98, 114–116
165, 219, 344 Boston, M. F. 47
Anthony, L. 355, 362 Bowles, M. A. 1, 3–4, 12
Boxell, O. 65, 68–69, 225
Baayen, R. H. 264, 266–269, 276, 278 Braze, D. 252
Baccino, T. 15 Brône, G. 11, 123
Bachman, L. 61 Brysbaert, M. 153–154, 157n4
Baddeley, A. D. 81 Bultena, S. 49
Bahill, A. T. 34 Burnat, K. 26
Bai, X. 40
Balling, L. W. 72, 75, 221, 340 Cameron, A. C. 307
Balota, D. A. 44 Canseco-Gonzalez, E. 49
Baltova, I. 349 Carrol, G. 71, 74–75, 104, 211, 221, 223
Bar, M. 105 Chambers, C. G. 144, 347–348
Barnes, G. R. 36 Chamorro, G. 65, 223
Barnett,V. 260, 265–266 Chen, H. C. 40, 43
Barr, D. J. 277–278, 280–281, 283, 288, Chepyshko, R. 290, 293, 295, 299–300,
294, 298, 302 302–303
Bates, D. 188, 190, 309n6 Choi, J. E. S. 33
Bates, E. 309n6 Choi, S. 77, 87n5
Bax, S. 82, 175, 211, 227, 233, 238, 241 Choi, S.Y. 41, 43
Bell, B. A. 277 Choi, W. 38–39
402 Index of Names
Cicchini, G. M. 34 Feng, G. 41, 60n4, 341–343

Cintrón-Valentín, M. 71, 73, 77–79, 205 Fernald, A. 108
Clahsen, H. 2, 10, 65, 68, 218, 220, Ferreira, F. 8–9, 191
221, 223 Field, A. 251, 268, 274, 301
Clark, A. 104 Findlay, J. M. 24, 38, 57, 59
Clark, H. H. 276 Flecken, M. 98, 120–121, 151, 159, 170,
Clifton, C. J. 9, 12, 14, 217, 246 191, 229–230, 288
Cohen, A. D. 3 Fodor, J. A. 9
Cohen, A. L. 258 Forster, K. I. 6
Cohen, J. 152–153 Foucart, A. 14, 16–17
Conklin, K. 1, 223, 261 Fox, M. C. 3
Cooper, R. 11, 90–91, 143 Fraser, C. A. 3
Cop, U. 6, 49, 58, 72, 75, 180, 220, 221, Frazier, L. 9
340–342 Frenck-Mestre, C. 1–2, 12
Corbetta, M. 21 Friston, K. 104
Cubilo, J. 62, 349–350 Fukkink, R. G. 3
Cuetos, F. 69
Cunnings, I. 89, 116–117, 119, 124, 168, Gass, S. M. 17, 81, 104, 244, 353
212, 285, 288, 298, 309n5, 345 Gelman, A. 278
Gilchrist, I. D. 30, 33, 36
Dahan, D. 49, 89, 95–96, 191–192 Godfroid, A. 1–4, 3, 12–13, 45–46,
Dambacher, M. 16 49, 50, 65–67, 69–73, 77, 104, 112,
De Bot, K. 3 124, 128–129, 131, 134, 140, 146,
De León Rodríguez, D. 159, 161, 175, 220, 148–149, 157n3, 159, 173–174, 178,
229, 231–232 204n2, 205, 210, 212, 219, 221–222,
DeLong, K. A. 103 224, 226, 228, 242, 245–246, 249n2,
Deutsch, A. 16, 43 250n3, 253, 261, 268, 269–270,
Dienes, Z. 204n2 272, 281, 285–286, 309n4, 332,
Dijkgraaf, A. 106–107, 124, 182, 183, 184, 339–340
194, 196, 212, 288, 298 Goo, J. 3
Dimigen, O. 15–16 Gough, P. B. 51
Dink, J. W. 290 Green, A. 3
Dodge, R. 311–312, 315 Green, P. 153
Dolgunsöz, E. 58 Gries, S. Th. 46, 251, 283
Drasdo, N. 26 Griffin, Z. M. 11
Drieghe, D. 57, 172, 283 Grosbras, M. H. 21
Duchowski, A. T. 11, 312, 314 Grüter, T. 106, 108–110, 114, 124, 229–230
Duñabeitia, J. A. 49 Guan, C. Q. 351
Dussias, P. E. 10, 12, 65, 69, 105, 108–110, Gullberg, M. 11, 122
143, 147, 229–230, 294, 348
d’Ydewalle, G. 81 Hafed, Z. M. 37
Häikiö, T. 39, 43, 58, 341
Eberhard, K. M. 91 Hair, J. F. 346
Eggert, T. 312, 314 Hama, M. 1–2
Elgort, I. 50, 71, 73, 223–224, 233, 340 Hattie, J. 153
Ellis, N. C. 65, 69, 226 Havik, E. 146
Ellis, R. 148 Hayes, T. R. 38
Engbert, R. 37, 51, 53, 56–57, 60n4 He, X. 228
Erricson, K. A. 3 Henderson, J. M. 32, 34, 38–39, 58,
Eser, I. 360 144, 345
Hering, C. 311
Felser, C. 10, 65, 68–69, 218–219, 221, Hilbe, J. M. 307
223–225, 233 Hintz, F. 192
Fender, M. 67 Hirotani, M. 49
Index of Names 403
Holmqvist, K. 24, 26, 33–34, 36, 59n1, 238, Kreysa, H. 11

241, 255–256, 314, 316, 319–320, 323, Kroll, J. F. 75, 100
329–330, 332, 336, 356–357, 358 Kühberger, A. 153
Holšánová, J. 11 Kuperberg, G. R. 103–104
Hopp, H. 6, 65–68, 98, 105, 108–110, Kuperman,V. 345
113–114, 116, 124, 160–161, 168, 195, Kurtzman, H. S. 69
197, 200, 224, 229–231, 345 Kutner, M. H. 138
Hosseini, K. 210
Hoversten, L. J. 72, 75–76, 87n6, 216, 223 Lachaud, C. M. 260, 265–266, 269
Huettig, F. 95, 97, 103–105, 112, 120, 124, Lagrou, E. 196
132, 145, 181, 192–193 Lamare, M. 311
Huey, E. B. 51 Larsen-Freeman, D. 146
Hutzler, F. 16 Larson-Hall, J. 152, 251, 274, 301
Hÿonä, J. 243, 345–346 Lau, E. 344
Lee, C. H. 62, 351–352
Ikeda, M. 41, 43 Lee, S. 83–84, 98, 120, 125n9, 129, 164,
Indrarathne, B. 71, 73, 77–78, 205, 216, 238, 241–242
226, 228, 332 Leeser, M. J. 2, 147–149
Inhoff, A. W. 261 Legge, G. E. 26, 28, 60n4, 174
Irwin, D. E. 35–36 Lemmerth, N. 195
Issa, B. 78 Leow, R. P. 2–4, 19, 77
Ito, A. 105, 107, 112, 124, 197, 288, 298 Lettvin, J.Y. 37
Izumi, S. 228 Leung, C.Y. 58
Leung, J. H. C. 2
Jackson, C. N. 147–148 Levenshtein,V. I. 246
Jacobs, A. M. 52 Lew-Williams, C. 108–109, 114, 332
Jacobson, E. 312 Li, X. 328
Jaeger, T. F. 273, 296, 298 Liberman, A. M. 298
Jegerski, J. 6, 65 Lim, H. 43
Jeon, E. H. 251, 274, 277–278 Lim, J. H. 65, 147, 175–176, 223,
Jiang, N. 1 235, 338
Jordan, T. R. 42–43 Linck, J. A. 281, 285
Joseph, H. S. S. L. 49, 50 Lipsey, M. W. 153
Ju, M. 196 Liu,Y. 351
Juffs, A. 66–67 Liversedge, S. P. 11, 46
Juhasz, B. J. 47–48, 49 Lotto, L. 189, 190
Just, M. A. 6, 8, 9 Lowell, R. 49
Luck, S. J. 14
Kaakinen, J. K. 3 Luke, S. G. 31–32
Kaan, E. 105, 107–108 Lupyan, G. 121
Kamide,Y. 96, 112, 192
Kaushanskaya, M. 64, 98, 120, 157n3, 191, 229 McClelland, J. L. 44, 92
Keating, G. D. 3, 12, 28, 30, 65, 127, 233, McConkie, G. W. 40, 57
235, 261 McCray, G. 82, 129, 159, 163, 171, 211,
Kennedy, A. 47, 57 233, 272
Kerlinger, F. N. 126 MacDonald, M. C. 9
Khalifa, H. 84 McDonald, S. A. 17, 60n4
Kim, E. 89, 116–119, 193, 288 McDonough, K. 11, 98, 120, 122, 124, 129,
Kliegl, R. 15, 46–47, 178, 329 191, 212–213, 272, 332, 348–349
Kohlstedt, T. 114–116, 130, 132, 134, 182, McIntyre, N. A. 243
212, 288 Mackey, A. 2
Kohsom, C. 40 MacLeod, C. J. 102
Krauzlis, R. J. 37 Marian,V. 89, 91, 99–101, 131–132, 134,
Kretzschmar, F. 16 144, 183, 187, 191, 212–213
404 Index of Names
Marinis, T. 3, 69 Plonsky, L. 46, 63, 148, 151–152, 251, 274,

Marslen-Wilson, W. 9 277–278
Martinez-Conde, S. 37 Polio, C. 347
Matin, E. 34, 199 Pollatsek, A. 42
Matuschek, H. 277–278, 281–283, Pomplun, M. 38
285–287, 309n6 Porte, G. 347
Mercier, J. 91, 99–102, 192, 212 Posner, M. A. 19–20
Meseguer, E. 246 Posner, M. I. 20
Metzner, P. 15 Potter, M. C. 6
Meyer, A. S. 11, 36, 49 Pozzan, L. 91, 116–119, 124
Meyers, I. L. 312 Pynte, J. 47, 57
Michel, M. 159, 353, 362n4
Miles, W. 359 Qi, D. S. 3
Mirman, D. 251, 288, 290, 298–299, 301
Mitchell, D. C. 2, 6, 10, 12, 17 Radach, R. 40, 52, 56, 213, 223
Mitsugi, S. 110–112, 124, 212–213, 298 Ratcliff, R. 260, 264–266, 269
Miwa, K. 72, 75, 159, 161, 175, 216, 220, Rayner, K. 9, 11, 21, 24, 25, 26, 28, 30–33,
222, 225 37–41, 43, 46–47, 58, 172, 221, 223,
Mohamed, A. A. 49, 50, 71, 73, 205, 233, 233, 261, 329, 341
235, 249n2, 250n3 Rebuschat, P. 128
Montero Perez, M. 63, 73, 80–81, 135, 155, Reichle, E. D. 31, 34, 54, 56, 58, 233, 246
159, 161, 205, 221, 224 Reilly, R. G. 60n4, 223
Morales, L. 108–110, 189, 200, 201, 212, Révész, A. 11, 77, 79, 134, 164
343–345, 348 Richard, F. D. 153
Morgan-Short, K. 2–3, 12, 14–15 Rizzolatti, G. 21
Morrison, R. E. 55 Roberts, L. 8, 12, 65, 118, 127, 224, 233
Mueller, J. L. 14 Robinson, P. 73, 79
Mulvey, F. 256 Rosa, E. 3
Munõz, C. 63, 80–82, 211 Rossion, B. 189, 190
Murray, W. S. 52, 56 Rothman, J. 112
Rubin, J. 349
Nassaji, H. 3 Runner, J. T. 89
Nesselhauf, N. 73
Nieuwenhuis, R. 267 Sachs, R. 3
Nyström, M. 328–329 Sagarra, N. 65, 69, 78, 159, 221–222,
224, 248
O’Regan, J. K. 43 Salverda, A. P. 144
Osterhout, L. 200 Salvucci, D. D. 60n4
Sanz, C. 1, 17, 321
Paap, K. R. 102 Saslow, M. G. 199
Paas, F. 351 Schmidt, R. 69, 104, 285–286
Panichi, M. 34 Schmitt, N. 151
Papadopoulou, D. 69 Schott, E. 312
Papafragou, A. 121 Sedivy, J. C. 112
Paterson, K. B. 41–43 Segalowitz, N. 87n4
Pawley, A. 73 Sekerina, I. A. 106, 112–113, 116,
Pellicer-Sánchez, A. 50, 71, 73, 205, 118–119, 124, 144, 168, 191, 193, 197,
211, 340 199–200, 212
Perfetti, C. A. 351 Sereno, S. C. 15–16
Peters, R. E. 106 Severens, E. 189, 190
Philipp, A. M. 72, 75, 211–213 Sheliga, B. M. 21
Phillips, C. 69 Shen, D. 40
Pickering, M. J. 104 Shen, H. H. 351
Index of Names 405
Shintani, N. 134 Vainio, S. 49, 66, 70, 221, 233

Simard, D. 19 Valian,V. 102
Sinclair, J. 73 Van Assche, E. 12, 72, 75, 212, 223
Singh, N. 102, 229–230 Van Hell, J. G. 14, 75, 100
Siyanova-Chanturia, A. 45, 49, 71, 74, 127, Van Merriënboer, J. J. 351
211, 221 Van Wermeskerken, M. 243
Skehan, P. 79 Vanderplank, R. 82
Smith, B. 353 VanPatten, B. 63, 97, 214
Snijders, T. A. B. 277 Veldre, A. 38–39, 58
Snodgrass, J. G. 189, 190 Vilaró, A. 243
Sonbul, S. 73–74, 180–181, 211, 221 Vitu, F. 43–45, 172
Song, M.-J. 228 Von der Malsburg, T. 208, 222, 226–227,
Sorace, A. 112 233, 246, 248
Spinner, P. 66, 148, 174–175, 221, 233 Vonk, W. 36
Spivey, M. J. 1, 89, 100, 144
Springob, C. 26 Wade, N. J. 10–11, 33, 311, 359
Starr, M. S. 46 Wagner, E. 349
Steinhauer, K. 13–14, 200 Wang, D. 329–330
Stephane, A. L. 320 Wang, M. 351
Stevenson, H. W. 342 Wedel, M. 25–26
Stroop, J. R. 102 Wells, W. C. 11
Styles, E. A. 20 Wengelin, A. 11
Sussman, R. S. 144 Whelan, R. 263–264, 269
Suvorov, R. 63, 82–83, 164, 233, 350 White, S. J. 57
Suzuki,Y. 98, 110, 112, 124, 192, 212, Whitford,V. 39, 42–43, 58
219, 246 Wicha, N. Y. Y. 103
Swain, M. 3, 228 Wilcox, R. 266
Szekely, A. 188, 190 Williams, J. N. 112
Williams, R. 49
Tamim, R. M. 153 Wilson, M. P. 9–10
Tanenhaus, M. K. 11, 46, 88–89, 91–92, 94, Winke, P. 12, 32, 63, 71, 77, 80–82, 145,
116, 123 205, 221, 261
Taylor, J. N. 58, 218–219, 345 Winskel, H. 40
Taylor, W. L. 47 Wray, A. 71, 73
Thiele, A. 34–35 Wright, R. D. 19–21, 33, 36
Thilo, K.V. 34–35
Tinker, M. A. 4 Yan, M. 328
Tokowicz, N. 3 Yang, H. M. 328
Tomlin, R. S. 19 Yang, S. 60n4
Tremblay, A. 64, 99, 138–140, 144, Yatabe, K. 35–36
194–195, 212 Yi, W. 71, 73
Trenkic, D. 105, 132, 136, 166–69, 183, Young, L. R. 59n1, 312, 314
212, 344–345, 347–348
Trueswell, J. C. 9, 117, 144 Zang, C. 328
Tsai, J. L. 328 Zhu, Z. 24
Zlatev, J. 121
Unsworth, S. 346 Zufferey, S. 65
INDEX
Italic page numbers indicate figures. Bold page numbers indicate tables.
accuracy 327, 328, 329 attention shifts 21

active vision 30–31 audio materials 138–139; creating
age of acquisition 48, 50, 59 196–198; questions about 85; software
ambiguity resolution paradigms 65, 67; and for 198; and time windows 198,
eye-tracking 68 199, 200, 201, 202; see also linking
ambiguous sentences 9–10, 116–117; hypothesis; research study designing;
see also referential processing subtitles; visual world paradigm
analysis of variance (ANOVA) 46, 50, awareness: and think-alouds 3
127, 130, 179, 265, 268, 272–273,
274–275, 296–297, 307; by-subject (F1) between-language competition 100–101
analysis 275; by-item (F2) analysis 275; between-subject designs 135, 136, 154,
Latin square designs 137, 138; see also 155; see also research study designing
repeated-measures ANOVA bilingual lexicon 75, 76; and
anomaly detection/violation paradigms eye-tracking 75
65, 87n3 bilingualism 101–102, 125n5, 213;
anticipatory baseline effects 302 Chinese-English 74; Dutch-English
anticipatory processing 110; see also 182, 196; and eye-tracking research
predictions 56–57, 75, 123; and fixations/skips
Applied Language Learning journal 63, 97 210–211, 212; Greek-English 117,
Applied Linguistics journal 63–64, 97 118; Italian-Spanish 343–344; Japanese
Applied Psycholinguistics journal 63, 97 language 110–112, 212, 222–223; and
areas of interest see interest areas linguistic abilities 106; and perceptual
artifacts: in eye-tracking records 15, 16, spans 42–43; Russian-English 100, 113,
261, 361 118–119, 131, 213; Spanish-English
associative learning theory 69 75–76; vocabulary acquisition 71–76
attention: allocation of 71–73, 80; and Bilingualism: Language and Cognition journal
input enhancement 77, 78; joint 63–64, 97, 98
attention 120, 122, 123; parallel bimodal input 80–81; see also subtitles
attention 52, 54, 56; and perceptual binocular disparity 329
spans 38; in SLA 56, 77 binomial phrases 45
Index 407
binning 290; see also time bins 264, 266, 274, 307; outliers 260, 261,
blinks, in eye-tracking records 255–256 265, 267, 269, 270, 271; residuals 268;
Brocanto2 study 14–15 sensitivity analysis 269; software for
252–253, 257, 261, 290; statistical tests
cameras 122, 159, 312, 316–317, 319, 271–272, 308; winsorization 266, 268
322, 329, 331–332, 333, 349, 355, 360; data collection: and areas of interest 159;
see also eye trackers logbook 359, 362; trials of eye-tracking
Canada: bilingualism in 42 data 356–360
Canadian Modern Language Review journal data collection methods 2; empirical
63, 97 data 31; eye tracking versus self-paced
captions see subtitles reading 69; offline methodologies 2;
categorical variables 127, 128, 132, online methodologies 2; qualitative 3;
136, 286 quantitative 3; real-time 86; see also
children: age/proficiency study 82–83; data research study designing
from 30; Finnish study of 43; saccades data quality 23, 140, 177, 252, 253, 256,
30, 34; and subtitles 82; and test-taking 320, 327, 329, 330, 333, 336, 348, 359
84, 85; TOEFL test study 84 databases: and images 188–189, 190
Chinese language: studies involving degrees of visual angle (°) 26, 27, 30,
40, 341–343, 345, 347, 351, 352; 176; Courier font 29, 30; and different
translated 74 tasks 32; formula 28; and saccades 33;
chinrests 361 see also visual angles
Cloze test 47 dependency paradigms 65
cognates 75, 177–178 dependency studies 68–69
cognitive processing: and eye gazes 22 dependent variables 127, 140, 145,
competitors 2, 91, 101, 115, 185, 192; 179, 238, 248, 272, 291, 292, 307;
competition effects 91, 98, 100, 101, bilingualism/SLA studies 214, 221, 226;
103; see also visual world paradigm and growth curves 293; scanpaths 246;
computers: in eye-tracking labs 12, 144, visual world studies 206, 208–209,
159, 176–177, 334, 335 212, 275
concurrent data collection methods 23; different-gender trials 108–109, 109,
see also online investigations; online 110, 125n6
methodologies discourse-level phenomena: and SPR 9,
concurrent verbalizations 12 112–113
confounds 46, 127–128, 182, 186, 188, 203 dispersion-based algorithms 323, 326, 328
content/container verbs 293, 294–295, doublets 128, 129, 132
302, 305, 306 drift 172–173, 257, 258
contextual constraint 31, 46, 49, 178;
see also word predictability E-Z Reader model 51, 53, 54, 55, 56
counterbalancing 136, 137, 138, 155–156; early processing: and eye-tracking 12;
and item lists 126, 135, 136, 155; see also measures of 216–220, 217, 221–224
research study designing ecological validity 12, 17, 122, 144, 174,
covert attention 38, 55; and eye 193, 321, 361
movements 21 EEG recordings 13–14; and eye
covert orienting 19–21 movements 15; FRPs 16
cross-linguistic research 41 effective field of view 38
Esperanto 114
data cleaning/analysis 251–252, 361–363; event conceptualization 121
data transformation 263–264, 265, 266, event-related potentials (ERPs) 2, 13,
270; drift correction 257, 258–259; 14–17, 18, 103; advantages of 14;
four-stage procedure 260, 261, 307; compared with eye-tracking 14; and
growth curve analysis 288, 290; the eye-tracking perspective 62; and
individual records/trials 253, 254–256, L2 proficiency 15; and predictability
257; logarithms 264, 265; logarithmic 16, 104; and processing 14; and reading
transformation 249n2, 250n3, 263, speeds 15; waveforms from 13, 14;
408 Index
see also (eye) fixation related potentials 332–333, 355–356; accessing 330;

([E]FRPs) calibration/cameras 360–361; desk-
event representations: changing 95, 96 mounted; electrooculography 312, 314;
experimental control 130, 132, 178, eye camera 316, 335; functioning of
179–180, 181–182 322–330; head-free 322; head-mounted
(eye) fixation related potentials ([E]FRPs) 316, 318–319; head tracking 319, 320;
15–16; waveforms 16 historic 311–312; lab set-up 331–332,
eye-mind link 51, 88; research assumptions 333, 334, 335, 336; license 12, 337, 356;
about 56 manufacturers 233, 238, 252, 261, 321,
eye-movement measures 205, 207–208, 327, 329, 330, 337, 355, 360; mobile
215–216, 247–250; consecutive counts 316, 319, 321, 332; Purkinje images
211–212, 214; counts 210, 211; early 315; remote 316–317, 319; scleral
216–220, 217, 221–224; expected contact lenses 312, 313; speed 322,
fixation duration 228; first fixation 324, 325, 326–327; static 316, 321, 322;
duration 220, 222, 247, 248–249; first types of 311, 314; video-based 314, 316,
pass reading time 180–181, 180, 215, 322–331
217, 218, 221, 222, 224; first subgaze 222; eye tracking 10–11, 18, 19–22, 176, 315;
fixation duration 228; fixation latency advantages of 12; compared to self-
229, 230–231, 247, 274; fixation location paced reading (SPR) 9; Courier font 28,
231, 232; gaze duration 221–222, 248, 29, 30; data from 289, 291–292, 313,
249; last fixation duration 225; late 322, 323, 324; defining 1; disadvantages
216–220, 217, 224–228; one-point of 12; and early processing 12; history
measures; refixation duration 222, 10–11, 311–312; and interest areas 159;
249, 249; regression path duration/ and learner attention 4; measurement
go-past time 223–224, 248, 249n1; accuracy 12; monocular vs. binocular
rereading time 224–225; second pass 360; natural reading process 6; and
time 224–225; single fixation duration orienting 19; studies of 64, 97;
220–221; skipping 172, 213–214; text- unobtrusivity of 11–12; versatility of 11;
based studies 206, 207, 209; total time see also (eye) fixation related potentials
225–227; total visit duration 227–228; ([E]FRPs)
two-point measures 325, 326; as variables eye tracking labs 331–332, 333, 334, 335,
205, 208–209; visual world studies 206, 336, 357–358; computers 12, 144, 159,
208–209; see also integrated eye-tracking 176–177, 334, 335; display PC 334, 335,
measures; regressions 335; host PC 331, 334, 335, 335, 336;
eye movement models 21, 51, 52–55, lab protocol; staffing 336–338
56–57; cognitive control models eyeglasses 361
51; oculomotor models 51; primary
oculomotor control models 52; see also field of active vision 30
E-Z Reader model; SWIFT model fixation cross 193, 194–195
eye movements 23, 30, 31–38; cognitive fixations 15–16, 18, 22, 24, 27, 30, 31,
factors of 31, 59; domain-general 31; 58, 124, 215–216; adult 32; analysis
domain-specific 31; during fixations focus on 32, 36, 212–213; binary data
37; during reading 43–46; language- 213; calculating 289, 289; children’s
mediated 93, 96; return sweeps 30; defined 31; durations of 32, 36,
172; skips 210, 213–14; smooth 48, 49, 53, 56, 88–89, 205–206, 211,
pursuit 36; timing of 46–48, 49, 214, 215–216, 247, 261, 262, 263,
50–51; undershooting/overshooting 265, 274; eye movements during 30,
44, 45; and unspaced languages 40; 37; and eye-tracker function 324–325;
vergence 36; vestibulo-ocular reflex 36; fixational eye movements 37, 37; and
see also saccades information perception 39; inside words
eye-to-brain lag 261 162; landing/launch zones 44–45; and
eye trackers 12, 23, 33, 177, 203, 213, microsaccades 37–38; odds of 291, 297,
238, 252, 311, 321, 322, 328, 331, 308n9; and perceptual spans 40–41; as
Index 409
processing measures 46; proportions roles of 184, 185; semantic competitors

212, 291, 292, 309n8; subtypes 272; 183, 184; visual properties 186, 187,
text-based studies 209; in text lines 188; see also research study designing
172; and time windows 198, 199, 200, implicit learning 50, 104, 112
201, 202; visual search 38; visual world incidental vocabulary acquisition 72, 76,
studies 209 135, 173, 226, 339
fixed effects: vs. random effects 277 independent variables 269–270, 272, 274,
fonts: and eye tracking studies 28, 30; 277, 278, 281, 282, 282, 286–288, 290,
font size 28, 29, 30, 174, 175, 176, 181, 293–294, 299
186, 309n13, 340, 343; measurement individual differences 33, 38, 39, 58, 67,
of visual angle 29; proportional vs. 71, 76, 99, 104, 105, 112, 124, 135, 180,
monospace 175, 176, 181; text-based 219, 279–280, 285, 345–346, 348
studies 174–175, 176, 177 inhibitory control 99–102, 103
Foreign Language Annals journal 63, 97 input enhancement 77, 78, 79
formulaic language 73–75; see also idioms; instruction effects: and predictions
multiword processing studies 113–114, 115, 116
fovea 24, 25, 26, 57; see also visual field instrument reliability 151–152
foveal inhibition 53 integrated eye-tracking measures 237,
foveating 33 239, 275; gaze plots 241–242, 243,
frequency: word frequency 6, 9, 47, 48, 59, 250; heatmaps 218, 237, 238, 240, 241,
220, 342 250; luminance maps 241; scanpaths
functional field of view 38, 57 242–244, 244–245, 246, 250; see also
eye-movement measures
gaze-contingent moving window intentional vocabulary learning 351–352,
paradigm 40, 41–42, 58 352, 353
generalized linear mixed-effects models interaction studies 348–349
(GLMMs) 274, 275, 299; link function interest areas: dynamic 96, 170, 170, 362;
297, 299; logit 275, 292, 293, 295, 297– images as 163–170, 170; larger text areas
299, 299, 303, 307, 309n10, 309n11; log 162, 163; size 165–166; text as 159–162;
odds 293, 295, 296, 297–299, 304, 307, word-based 159, 160–161
309n9; see also logistic regression interface hypothesis 78
global perception span 39 internal validity 4, 130, 257, 327
glosses 259 interviews: and non-concurrent verbal
goodness-of-fit 8, 283; see also model fit reports 17
grammar: learning studies 70, 71; and invalid-cue trials 20
parsing 66; in text-based studies 65–71; item preview 140, 192–195
violations of 66
growth curve analysis 288, 290–291, 293, Journal of Memory and Language,
295, 302 Cognition 98
Journal of Second Language Writing 63, 97
head tracking 319, 320
higher-order polynomials 300 L1 studies: of saccades 34; and SPR 10;
homographs 75–76 individual differences 33, 38, 39, 58,
homophones 75 67, 71, 76, 99, 104, 106, 112, 124, 135,
human eye 25, 26, 27, 57, 314, 359, 361; 180, 219, 279–280, 285, 345–346,
cones/rods 25, 26; see also visual acuity; 348; monolinguals 42, 58, 72, 74, 76,
visual field 105, 113, 117, 118, 120, 121, 124, 340,
343, 344
idioms 104; and formulaic language 73–75; L2: benchmarks 58, 153, 261; future study
see also multiword processing studies suggestions 34; proficiency/predictions
images: and auditory input 182; baseline 109; and SPR 10
effects 183–184, 186, 200; and colour language-as-a-fixed-effect fallacy 276
188; naming consistency 188–189, 190; language assessment 82, 83, 84, 85
410 Index
Language Awareness journal 63, 97 model criticism 265–271, 270, 281,

Language, Cognition, and Neuroscience 285, 286
journal 98 model fit 266, 269–270, 271, 278, 281,
language development: and reading skill 32 283, 286, 303; see also goodness-of-fit
Language Learning journal 63–64, 97 model selection: backward 281, 283, 284,
language processing 203; and eye 286, 304, 309n12
movements 32 Modern Language Journal 63–64, 97, 98
Language Teaching Research journal 63, 97 monolingualism 72; and perceptual
Language Testing journal 63, 97, 214 spans 42–43
late learners: and grammar features 16; morphosyntactic predictions 108, 109, 110,
shallow structure hypothesis 10, 68, 111, 112
69, 218 multi-word processing studies 72–73;
Latin square design 137, 138 and eye-tracking 75; see also formulaic
learner attention: and eye tracking 4 language; idioms
(Left) Anterior Negativity or (L)AN 14–15 multimodal input 80, 81
letter identity span 39, 43, 60n3
Levenshtein metric 246 N400 14–16
lexemes: in bilingual lexicons 71–72 naming consistency: and images
lexical access 219; selectivity of 76; lexical 188–189, 190
retrieval 54, 68, 75, 76, 103, 104, 105, natural polynomials 301, 302
109, 113, 193, 219 natural reading process: and EFRPs 16; and
lexical activation 100 eye tracking 6
lexical alignment 354 non-concurrent verbal reports: and
lexical competition 91, 92, 100–101, interviews 17; and stimulated recall 17
103–104 non-violation paradigm 66, 71
lexical decisions 222 noticing hypothesis 69
lexical decoding skills 68, 87n4 “noticing the gap” 3
lexical processing 54, 68, 114–115; L1 vs. null hypothesis significance testing 204n2
L2 69; and saccades 35
lexical representations: of adolescents 32 object identification: perceptual spans 38
lexical-semantic processing 14; Brocanto2 objects: and real-world knowledge 96
study 15 oculomotor factors 51, 59
linear mixed-effects models (LMMs) 266, offline methodologies 2
268, 271, 272–273, 274–278, 281, online investigations: preferences for 1
282, 283, 284–285, 286, 286, 287, 309; online methodologies 2; see also think-
elog 275, 292, 293, 298, 299, 299, 304, aloud protocols
309n11; empirical logit 275, 292, 293, optimal viewing position (OVP) 43, 59
295, 298, 303, 307, 309n11 organization: importance of 356–357
linguistic knowledge: and prediction 104 orienting 19; covert 19; overt 19
linking hypothesis 89, 90, 91–92, 93–94, orthogonal polynomials 301, 302
96, 191 outliers 252, 258, 260–271, 260, 268, 277,
listening as multimodal process 349–351 285, 307
location cuing see spatial cuing overt orienting 19, 21
log frequencies: researcher use of 47
logarithms 264 P600 index 14–15; French language
logistic regression 275; link function 297, study 16
299; log odds 293, 295, 296, 297–299, parafovea 24, 25; see also visual field
304, 307, 309n9; logit 275, 292, 293, parafoveal-on-foveal effects 57, 329
295, 297–299, 299, 303, 307, 309n10, parafoveal-preview effects 24, 55
309n11 parafoveal processing 22, 24, 55–57; word
boundaries 59
metalinguistic information 114 parafoveal words 44
Miles test 359 parser 9; and grammar 66
Index 411
parsing sentences 66–67 reading: cognitive processing model 84;

parsing/syntactic structure, and SPR 9 and eye-movement recording 12, 15;
partial word knowledge 219 parallel processing 52; and word
Pearson Test of English Academic 83 processing 52
perceptual lobe 38, 57 reading comprehension: and thinking
perceptual span 38, 39, 40–43, 57–58; and aloud 4, 5, 6
bilingualism 42–43; and fixations 40; reading direction: and perceptual span
letter feature span 39; letter identity 41–42, 58
span 39; and monolingualism 42–43; reading processing 83–84
and reading directions 41–42; and word reading skill: children’s 43; and language
lengths 44 development 32; and saccades 34
periphery 24, 25; see also visual field reading studies 345–347
phonology 91 reading time 265, 266; and refixation 43
pilot study 356 reanalysis 10, 219, 223–224
polynomials: higher-order 300; natural Reduced Ability to Generate Expectations
301, 302; orthogonal 301, 302 (RAGE) hypothesis 105
Porta tests 359 referential processing 116–117, 118–119,
postlexical language processing 15, 233 120; co-reference 117, 120
Potsdam Sentence Corpus 47–48 refixation: and reading time 43
power; and item numbers 151–154, 154, regions of interest see interest areas
155, 156; statistical power 152–153, 156; regressions: regression in 236, 297, 307,
see also research study designing 341; regression out 223, 235, 236;
preferred viewing locations (PVL) 43, 59 regressive eye movements 32, 55, 215,
pre-motor theory 21 223–224, 232–234, 234, 235, 236;
precision 327, 328 statistical analysis 8, 127
prediction 103, 104, 104–107; association- reliability: instrument reliability 151–152
based 105; and bilingualism 212; repeated-measures ANOVA 275, 276;
different-gender trials 108–109, 109, see also analysis of variance (ANOVA)
110, 125n6; expanding research in 110; research labs: set-ups for 12, 357–358
gender-based prediction 105, 108–110, research programs: increased use of eye
114, 125n6, 200; and instruction effects movement recording in 1
113–114, 115, 116; Japanese language research study designing 48, 64, 126,
110–112, 212; multiple cues 112–113; 134, 156–58, 202–203; and ANOVA
prediction window 168, 199, 295; 46, 50, 127, 130; auditory input 182,
and proficiency 105; and visual world 196; between-subject designs 135,
paradigm 103, 104, 105–107 136, 154, 155; counterbalancing 136,
preview 15, 18, 21, 24, 39, 55–56, 138, 140, 137, 138, 155–156; defining interest
191–195, 194–195 areas 158–159, 171; entry level 338,
preview benefits/effects 39 339–340, 341; experimental control
primary tasks 2–3, 126, 142–145; action- 130, 132, 178, 179–180, 181–182;
based 144–145; look-while-listening experimental materials 126–127; eye-
143–144; and secondary tasks 4 movement measures 208; fixation cross
processing load measures 82, 88–89; 193, 194–195; image-based 163,
eye-fixation duration 46, 88–89 164–170, 182; image selection 182;
prosody 197 instrument reliability 151–152;
intentional vocabulary learning ideas
quadruplets 128, 129, 131, 132, 133 351–353, 352; interaction studies
348–349; item lists 126, 135, 136,
random effects structures 278, 155; item numbers 151–154, 154,
279–280, 281 155, 156; Latin square designs 137,
reaction time (RT) measurements 4, 138; listening as multimodal process
263–265 349–351; multiple input sources 130,
reactivity: and think-aloud protocols 4 132; participant views of 134; previews
412 Index
140, 191–195; primary/secondary display 8; compared to eye tracking 9;

tasks 126; reading studies 345–347; cumulative, linear technique 7, 8; and
statistical control 180; statistical early processing 12; moving window
power 152–153, 156; Synchronous technique 7, 8, 9, 59n2; and new
Computer Mediated Communication forms 9; parsing/syntactic structure
(SCMC) study 353–355; text-based 9–10; predicted reading times 8; reading
eye tracking 63–65, 154, 257; topic times 8–9; and shallow structure
searches 61–63; trials 86n2, 138, hypothesis 69; shortcomings of 10;
139, 140–142; variables 49, 127–128, software for 12; studies of 6, 7, 9–10
129–130, 131, 178–181, 272, 278; semantic competitors 183, 184
visual world studies 64, 154, 186, 343, sentence processing 67, 86–87n2, 217–218;
344, 345, 347–348; within-subject entry level experiment 338, 339;
designs 134–135, 136, 154, 155, 156; filler-gap dependencies 69; and fixation
see also audio materials; images; counts 211; questions about 9
text-based study guidelines shallow structure hypothesis 68, 218; and
revision difficulties 117 self-paced reading (SPR) 69
short-term memory: and thinking-aloud
saccades 32, 33, 35–36, 43, 59n1, 210, protocols 3
250n4; acceleration 33–34, 33, 323; single-word processing studies 72, 75, 161;
amplitude 14, 16, 33–34; children’s and eye movement 72
30, 34; deceleration 33–34; defined skewed data 263
33; and degrees of visual angle (°) 33; spatial cuing paradigm 19, 20
durations 36; during different tasks 32; speech recognition 192
launch/landing sites 44; main sequence spatial sampling errors 324–327, 325
33–34; microsaccades 37–38; and motor spillover region 60n5; in saccade studies
programming 21; studies of 35; SWIFT 35; see also target words
model 53; velocity 33; vergence 36; and split attention effects 352
word length 40; see also eye movements split corneas 255
saccadic suppression 34; backward lateral spoken language research 17, 89, 91;
masking 34–35 see also visual world paradigm
Satterthwaite’s method 282–283 statistical analyses 242, 247, 263–264,
script effects 341–343; see also Chinese 265, 283–287, 308; by-subject (F1)
language; reading direction analysis 276, 278, 281–283, 284, 298;
second language acquisition (SLA) 17, by-item (F2) analysis 276, 278, 281,
86; Adult 77–80; and eye-tracking 283, 284, 298; non-independence
research 56–57, 124; instructed 77–78, 227, 248, 248; normality assumption
79, 80; instrument reliability 151–152; 263–264
Latin instruction 78; and offline statistical control 180
methodologies 2; and prediction 104; stimulated recall: and non-concurrent
questions about 80; see also bilingualism; verbal reports 17
children Studies in Second Language Acquisition
Second Language Research journal 63–64, journal 63–64
97, 98 subtitles 80, 81, 82; and fixations 211, 226;
secondary tasks 4, 126, 142–143, 145, keyword captioning 81–82, 135, 155,
149, 150; comprehension questions 162, 172
145–147, 150; grammaticality judgment SWIFT model 51–52, 53–54
tests (GJTs) 145, 148–149, 150, 245, Synchronous Computer Mediated
245, 246; idea units 145; plausibility Communication (SCMC) study
judgments 145, 147, 150; thinking aloud 353–355
as 3; translations 145, 147–148, 150 syntactic dependencies, data collection
self-paced reading (SPR) 2, 4–6, 8–9, 18, methods 69
59n2; button-press studies 6; centered syntactic processing 14, 68, 91
Index 413
target words: processing of 73; in saccade 140, 147; and fixation crosses 193–195;
studies 35; in visual world studies 91, practice trials 19, 138, 140, 177
166–167, 183; see also spillover region triplets 128, 129, 132, 180
task-based teaching/learning 79 tuning hypothesis 69
temporal sampling errors 324, 325, 326
TESOL Quarterly journal 63–64, 97 unconscious processes: methodologies
test taking behavior 3, 84, 85 capturing 3; see also implicit learning
test validity 84; and eye tracking 83 unspaced languages: and eye movements 40
text-based studies 65, 97, 98, 206;
defining 63; eye-tracking measures valid-cue trials 20
206, 207, 209; and fonts 174–175; validity 4, 12, 17, 58, 83–84, 85, 122, 127,
grammar 65–71; primary tasks for 143; 130, 144, 146, 174, 193, 226–227, 257,
vocabulary acquisition 71–76 260–261, 264, 321, 327–328, 361;
text-based study guidelines 181; artistic see also ecological validity; internal
factors 174–175, 176, 177; double validity; test validity
spacing 172, 362; linguistic constraints velocity-based algorithms 323
177–178; spatial constraints 171–72, visual acuity 24, 25, 26, 33, 39
173, 174; see also research study visual angles 26, 27–28; fonts 174; formula
designing 27, 28; see also degrees of visual angle (°)
text-reading experiments: and research visual field 24, 26, 27, 39; see also fovea;
ideas 339, 340 parafovea; periphery
text skimming: eye tracking records of 253, visual lobes 38
254, 255 visual spans; see also perceptual spans
text-to-speech programs 196 visual world paradigm 11–12, 17, 21,
think-aloud protocols 2–3, 23n1; awareness 88, 92, 94, 97, 112, 125n5, 143, 181,
vs. processing depth 4; equipment for 206; carrier phrase 200, 201, 202;
12; and reactivity 4; and short-term competitors 91; distractors 166–167,
memory 3; and SPR 12; studies of 168; dynamic interest areas 170; entry-
12–13 level ideas 343, 344, 345; experiment
thinking aloud: defined 2–3; and reading designing 186; and eye movements 89,
comprehension 4, 5, 6; and SLA 103, 107; eye tracking 97, 123; fixation
research 3, 17, 23n1 crosses 193, 194–195; image use in
time bins 288, 296, 298; see also binning 165, 166–170; and instruction effects
time-course analysis 287, 288, 302–304, 113–114, 115, 116; intermediate ideas
305, 305; anticipatory baseline effects 347–348; looking-while-listening
302; choosing time terms 299, 300–301, 21, 142–145, 193; morphosyntactic
302; data visualization 291, 293; growth predictions 108, 109, 110, 111, 112;
curve analysis 288, 290–291, 293, 295, oral production 120, 121–122, 123;
302; logistic/quasi-logistic regressions predictions 103, 104, 105–107, 112,
296, 297–298, 299, 302–303, 304–305; 123; previews 191–195; research
reporting 306 98–99; semantic predictions 107; Stroop
time stamps 199–200, 229, 290 tasks 99, 102, 229–230, 231; word
time windows: and fixations 198, 199, 200, recognition 99–100, 101–102, 103;
201, 202 see also spoken language research
TRACE model 92, 93 vocabulary 71; and images 189; see also
track loss 253–257, 308n1 bilingual lexicon
transparency: in research 196–197 vocabulary acquisition 3, 71–76, 135;
trials 86n2, 138, 139, 140–142; and areas questions about 76; and reading 72;
of interest 159; critical trials 70, 70, see also incidental vocabulary acquisition;
131, 138–141, 213, 344; distractor trials intentional vocabulary learning
166, 168, 184–186, 185; experimental vocabulary research study set-up 159,
trials 140, 157n1, 193; filler trials 138, 160, 161
414 Index
within-language competition trials word lengths 40; and perceptual spans 44

131–132 word predictability 47
within-subject designs 134–135, 136, word recognition 48–49, 99–100,
154, 155, 156; see also research study 101–102, 103, 188, 219
designing word skipping 22

PDF

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

PDF

Hochgeladen von

Copyright:

Verfügbare Formate

EYE TRACKING IN SECOND

LANGUAGE ACQUISITION AND

Eye Tracking in Second Language Acquisition and Bilingualism provides foundational

Aline Godfroid is an Associate Professor in Second Language Studies and

Using Judgments in Second Language Acquisition Research

For more information about this series, please visit: www.routledge.com/

List of figures xii

1 Introducing Eye Tracking 1

2 What Do I Need to Know about Eye Movements? 24

2.6 How Tight Is the Eye-Mind Link? A Look at Two Models of

3 What Topics Can Be Studied Using Text-Based Eye

4 What Topics Can Be Studied Using the Visual World

5 General Principles of Experimental Design 126

6 Designing an Eye-Tracking Study 158

7 Eye-Tracking Measures 205

8 Data Cleaning and Analysis 251

9 Setting up an Eye-Tracking Lab 311

1.1 Converging evidence from traditional null hypothesis

2.12 Urdu and English sentences displayed in a gaze-

5.5 A quadruplet drawn from doublets of visual and

6.24  Splicing the target noun onto the carrier phrase201

8.1 Text skimming 254

8.25 Fitted (predicted) values for looks to target in the early

9.28 Three presentation formats of Chinese words, their

1.1 Predicted reading times per word in four reading methods 8

6.1 Best-fitting linear mixed effects model for log first pass

Eye-movement registration, commonly referred to as eye tracking, has proven

1.1 Online Methodologies in Language

(b) Cumulative, linear display

(c) Centered display, also known as the stationary window technique

TABLE 1.1 Predicted reading times per word in four reading methods

Processing stage Variable Regression weight (ms)

TEXTBOX 1.1. PARSER

movements at the time, scientists reverted to the use of afterimages (a type of

instance, reading with eye-movement recording is said to resemble natural reading

FIGURE 1.3 Sample ERP waveforms, illustrating different ERP effects.

waveform is known as an event-related potential (ERP). It serves as the input for

order violations (i) in Session 1, when participants were at a low proficiency

Think alouds SPR Eye tracking ERPs

Type of data Primarily qualitative: Quantitative: Quantitative: Quantitative:

methodologies; (ii) by triangulating different methods, researchers can offset the

1.2 Why Study Eye Movements?

2.1 The Observer and the Visual Field

Underlying these functional subdivisions of the visual field is the biological

Optic tract Optic chiasm

FIGURE 2.4 A three-dimensional rendering of the ellipsoid-shaped visual field with

Figure 2.5 is a schematic representation of a participant in an eye-tracking experi-

2.2 Types of Eye Movements

investigated whether young adolescents’ quality of lexical representations (as

(Source: Modified from Duchowski, 2007, and Holmqvist et al., 2011).

FIGURE 2.8 Short- and long-distance saccades in a reading experiment. The display

in more naturalistic settings the participant is moving, the environment is

eye-movement recordings, microsaccades can be spotted when there is a very short

2.3 The Perceptual Span

span is longer in the reading direction. Readers of alphabetic writing systems

TEXTBOX 2.1. PREVIEW BENEFIT OR PREVIEW EFFECT

Researchers have identified word length, which is visualized by word spacing

This finding is consistent with previous studies investigating leftward read-

may be somewhat reduced compared to that of monolingual readers as a result of

2.4 Where the Eyes Move

TEXTBOX 2.2. LAUNCH SITE AND LANDING SITE

TEXTBOX 2.3. WORD N AND WORD N + 1

TEXTBOX 2.4. UNDERSHOOTING AND

List of figures xii

1 Introducing Eye Tracking 1

2 What Do I Need to Know about Eye Movements? 24

2.6 How Tight Is the Eye-Mind Link? A Look at Two Models of

5 General Principles of Experimental Design 126

6 Designing an Eye-Tracking Study 158

7 Eye-Tracking Measures 205

8 Data Cleaning and Analysis 251

9 Setting up an Eye-Tracking Lab 311

1.1 Converging evidence from traditional null hypothesis

2.12 Urdu and English sentences displayed in a gaze-

5.5 A quadruplet drawn from doublets of visual and

6.24 Splicing the target noun onto the carrier phrase201

8.1 Text skimming 254

8.25 Fitted (predicted) values for looks to target in the early

9.28 Three presentation formats of Chinese words, their

1.1 Predicted reading times per word in four reading methods 8

6.1 Best-fitting linear mixed effects model for log first pass

1.1 Online Methodologies in Language

TABLE 1.1 Predicted reading times per word in four reading methods

FIGURE 1.3 Sample ERP waveforms, illustrating different ERP effects.

1.2 Why Study Eye Movements?

2.1 The Observer and the Visual Field

FIGURE 2.4 A three-dimensional rendering of the ellipsoid-shaped visual field with

2.2 Types of Eye Movements

FIGURE 2.8 Short- and long-distance saccades in a reading experiment. The display

2.3 The Perceptual Span

2.4 Where the Eyes Move

2.6 How Tight Is the Eye-Mind Link? A Look at

FIGURE 2.16 Example of a numerical simulation of the SWIFT model.The thick black

FIGURE 2.17 Schematic representation of the E-Z Reader model.The signal to move

3.1 Finding a Research Topic

3.2 Research Strands within Text-Based Eye Tracking

FIGURE 3.2 A critical trial in a grammar learning experiment experiment.The second

TABLE 3.1 Questions in eye-tracking research on grammar acquisition and processing

3.2.2 Vocabulary and the Bilingual Lexicon

TABLE 3.2 Questions in eye-tracking research on vocabulary and the bilingual lexicon

3.2.3 Instructed Second Language Acquisition

TABLE 3.3 Questions in eye-tracking research on ISLA

TABLE 3.4 Questions in eye-tracking research on multimodal input

FIGURE 3.5 A gap-fill task in assessment research.

TABLE 3.5 Questions in eye-tracking research on language assessment

4.1 Foundations of the Visual World Paradigm

FIGURE 4.1 Display used in Cooper (1974) while participants listened to a story.

FIGURE 4.2 Display used in Allopenna et al. (1998).

FIGURE 4.4 Display used in Altmann and Kamide (1999).

FIGURE 4.5 Display used in Altmann and Kamide (2007).

4.2 Research Strands within Visual World Eye Tracking

FIGURE 4.7 Display used in a word recognition experiment.

TABLE 4.2 Questions in visual world eye tracking on prediction

FIGURE 4.10 Display used in a morphosyntactic prediction experiment. Each image

4.2.2.4 Prediction Using Multiple Cues

FIGURE 4.13 Three experimental conditions in a sentence-matching task. Participants

TABLE 4.3 Questions in visual world eye tracking on referential processing

TABLE 4.4 Questions in visual world eye tracking on production

5.1 Doublets, Triplets, and Quadruplets

FIGURE 5.2 Schematic representation of different item types.