Sie sind auf Seite 1von 15

Biol Philos

DOI 10.1007/s10539-017-9571-5


The evolution of syntactic structure

Richard Moore1

Received: 22 January 2017 / Accepted: 15 March 2017

 Springer Science+Business Media Dordrecht 2017

Abstract Two new booksCreating Language: Integrating Evolution, Acquisi-

tion, and Processing by Morten H. Christiansen and Nick Chater, and Why Only Us:
Language and Evolution by Robert C. Berwick and Noam Chomskypresent a
good opportunity to assess the state of the debate about whether or not language was
made possible by language-specific adaptations for syntax. Berwick and Chomsky
argue yes: language was made possible by a single change to the computation
Merge. Christiansen and Chater argue no: our syntactic abilities developed on the
back of natural selection for general-purpose sequence learning mechanisms. While
Christiansen and Chaters book testifies to impressive developments in construc-
tivist approaches to language development, its not obvious that it has the resources
to explain the hierarchical nature of syntactic binding. Despite this, the views have
much in common.

Keywords Language evolution  Syntax  Universal Grammar  Merge  Sequence


Robert C. Berwick and Noam Chomsky (2016). Why Only Us: Language and
Evolution. MIT Press, Cambridge.

Morten H. Christiansen and Nick Chater (2016). Creating Language: Integrating

Evolution, Acquisition, and Processing. MIT Press, Cambridge.

& Richard Moore
Berlin School of Mind and Brain, Humboldt-Universitat zu Berlin, Unter den Linden 6,
10099 Berlin, Germany

R. Moore

One of the more heated academic debates of recent years concerns the extent to which
the human ability to use and understand syntactically complex utterances is the result of
adaptations for language. The dispute has (roughly) two sides. Nativistsled by Noam
Chomskyargue that language acquisition can be explained only on the assumption
that, in addition to all of our general purpose cognitive abilities (memory, social
cognition, and the like) humans possess a hardwired and domain specific faculty of
language. This nativism is motivated by the poverty of the stimulus argument,
according to which young childrens exposure to language could never suffice for them
to learn that many possible natural language grammars are wrong. Since this would
make any natural language unlearnable, Chomsky argues that children must possess
genetically inherited knowledge of languageor Universal Grammar (hereafter UG),
the genetic component of the faculty of language (BC p. 90). UG constrains our
judgements about which of the possible sentences of a language are syntactically well
formedand thereby makes the task of language acquisition computationally tractable.
Historically, some prominent nativistsnot least Chomsky himselfhave doubted
that this faculty of language could have arisen under natural selection, and so argue that it
likely emerged by other evolutionary mechanismslike random mutation, or exaptation.
In contrast, proponents of the non-nativist Construction Grammar viewled by Michael
Tomasello, Morten Christiansen, and Nick Chater, among othershave argued that
language acquisition does not require any innate faculty of language. Moreover, they
argue that the domain-general cognitive mechanisms exploited in our language
acquisition and use have been shaped by natural selection in just the same manner as
our other cognitive processes. Thus debate crystalises around two questions: first, the
extent to which our syntactic abilities originate in brain functions that are specific to
language, and second the question of whether or not these abilities were the product of
natural selection. In principle, at least, answers to these questions are independent.
On the face of it, both nativist and non-nativist views of the origins of syntactic
structure are quite sane, and the merits of each have been defended by some of the
worlds best thinkers. Nonetheless, the debates have been characterised by mutual
acrimony. Both sides have been guilty of what Pullum and Scholz (2005) call
irrational exuberance.
Especially for newcomers and the undecided (among whom I include myself),
the recent publication of two books presents a good opportunity to assess the
progress of the competing views in this debate. Creating Language: Integrating
Evolution, Acquisition, and Processing by Morten H. Christiansen and Nick Chater
(2016 MIT; hereafter CC) presents a constructionist approach to language
understanding. Meanwhile, in Why Only Us: Language and Evolution (also 2016
MIT; hereafter BC), Robert C. Berwick and Noam Chomsky present the nativist
account. The latter is particularly notable because it is Chomskys long-awaited
book-length treatment of the subject of evolutiona topic on which his previous
remarks have been both spare and somewhat cryptic, and on which even relatively
devout Chomskyans have sometimes departed from his views.1 Both books are

On this point, see and Pinker and Bloom (1990), Jackendoff (2002), and Progovac (2015). Following
Chomsky these authors hold that humans do possess a hardwired faculty of language, but they argue that
it likely emerged gradually, and as a product of natural selection.

The evolution of syntactic structure

imperfect but valuable contributions to the language evolution literature. However,

newcomers to the debate in question will most likely be struck by how similar the
presented views are. The sources of disagreement are subtle.
I start by discussing BC, before turning to CC, and then finishing with a
discussion of their relation. (Since CC is considerably longer than BC, and since
many of its claims will be newer, the elaboration of CC will take more time.)
The opening sections of BC begin by denying that the evolution of language has
ever been a secondary concern of the nativist programme. However, they concede that
recent progress in the Minimalist Program (MP) in linguistics has made questions of
the evolvability of language seem more tractable than they did in the early days of the
Chomskyan research program. On the earlier view (e.g. Chomsky 1965), Universal
Grammar was taken to consist of a large number of detailed and highly abstract
transformational rules that governed sentence processing. However, these grammars
were so complex that it was clear at the time they could not possibly be evolvable
(BC p. 2) under the framework of natural selection. Over the next decades the number
of rules thought to govern the syntactic hierarchies of language was gradually reduced,
until the Minimalist Program (Chomsky 1995) posited just a single computational rule:
Merge. This is the operation that takes any two syntactic elements and combines them
into a third, larger, hierarchically structured expressionas when, for example, the
word-like units [read] and [books] are combined to generate the verb phrase [read
books]. Merge binds both word-like units into phrases and, applied recursively, simple
phrases into hierarchically structured complex phrases. On minimalist formulations of
UG, BC included, the complex principles and parameters that were once thought
constitutive of UG are now taken to have emerged as by-products of currently
unknown physical and biological constraints on the implementation of Merge.
In light of the Minimalist Program, BC take the task of an account of language
evolution to be to explain the emergence of three things that are each necessary and
together sufficient for the emergence of natural languages: (1) the combinatorial
operator of Merge, (2) the sensorimotor interface needed for vocal learning and
producing and parsing utterances, and (3) the conceptual interface needed for
thought. Together Merge (described by BC as potentially the simplest possible
computation (BC p. 98)) and concepts (the minimal meaning-bearing elements of
human languages (p. 90)) generate a language of thoughtwhich is what BC
primarily have in mind when they talk about language. The addition of a
sensorimotor interface is what enables subjects possessed of a language of thought
to articulate their thoughts for one another via a process of externalisation.
Externalisation is necessary for natural languages, if not for language per se.2
According to BC, key elements of the human sensorimotor system likely came
with the emergence of FOXP2, which was (in most elements) shared by humans and
Neanderthals and so seems to have arisen around 300,000400,000 ago, before
ancestral human and Neanderthal populations split (Krause et al., 2007). However,
BC date the emergence of language much later than this. Thats because, at least
Note that while BCs formulation of the language of thought claim sounds Fodorian (e.g., Fodor 2008),
they do not cite Fodor here. Moreover, their criteria for concept possession are presumably not like
Fodors since, on Fodors view animals also have a language of thought. Sadly, BC do not say enough
about what they think concept possession is to clarify this difference.

R. Moore

among extant creatures, BC doubt that any possess the word-like concepts that
humans do Moreover, BC are sceptical that even Neanderthals possessed language,
given that they left scant evidence of symbolic behaviour, which is taken to be a
reliable indicator of language use. Summarising their view, BC write:
In some completely unknown way, our ancestors developed human concepts.
At some time in the very recent past, apparently some time before
80,000 years ago , individuals in a small group of hominids in East Africa
underwent a minor biological change that provided the operation Merge an
operation that takes human concepts as computational atoms and yields
structured expressions that, systematically interpreted by the conceptual
system, provide a rich language of thought. At some later stage, the internal
language of thought was connected to the sensorimotor system[.] In the
course of these events, the human capacity took shape[.] (BC p. 87)
The book presents a fascinating array of empirical data, but it is not particularly
well organised or sign-posted. Partly because of this, and partly because BC engage
in a lot of (justifiably) cautious speculation, their exact view is sometimes tricky to
discern. Evidently, they take the human conceptual system to have emerged
sometime before 80kya, and so prior to the small rewiring of the brain (BC
p. 107) that created Merge. However, they think the circumstances surrounding the
arrival of the conceptual system wholly mysterious. If I understand correctly, they
also think that language proper emerged only with the last step of externalisation,
which came just under 50kya following a seemingly uniquely human change to one
of the regulatory elements of the FOXP2 gene (Maricic et al. 2013). This final tweak
to the sensorimotor interface enabled the rapid emergence of language. The speed of
this change could be so rapid because it arose not as a consequence of genetic
change, but due to changes in gene enhancers.
BC argue that selection pressure that lead to a genetic sweep for Merge would
have been independent of any function it served for communication. In their words:
the modern doctrine that communication is somehow the function of language is
mistaken a traditional conception of language as an instrument of thought is
more nearly correct (BC p. 107). Furthermore:
[I]nitially, at least, if there was no externalization, then Merge would be just
like any other internal trait that boosted selective advantage internally, by
means of better planning, inference, and the like. (BC p. 164)
This claim is just sketched, but it is, they say, supported by evidence that
language-using adults perform better in some reasoning tasks than pre-verbal infants
(BC p. 165). So on BCs view, even if some elements of externalisation were
subject to selection pressure for better communication, language itself was not.
BC is not an easy read, but it is a very good book. For a volume of 177 pages
(including notes), it is impressively detailed, and should certainly banish the myth that
Chomsky does not understand evolution. I found myself largely persuaded by their
argument for the claim that Merge might have arisen through a very minor biological
change. Though the move to the MP is now over 20 years old, I suspect that some will
see the very explicit retreat from the original vision of UG as a flight from the spirit of

The evolution of syntactic structure

what made Chomskys ideas distinctive. However, its not an objection to a view that it
has changed over time. This is particularly so when the later view is a carefully
considered reformulation of the original one, made in light of compelling scientific
argumentslike considerations about the evolvability of UG.
Nonetheless, not all of BCS arguments are persuasive. In particular, BC
sometimes overstate the differences between human and non-human communica-
tion and cognition. In several places, they argue that even our nearest relatives, the
non-human great apes, lack concepts. For example:
The minimal meaning-bearing elements of human languages wordlike, but
not words are radically different from anything known in animal
communication systems. (BC p. 90)
[T]ogether with the word-like atomic objects, Merge is one key evolutionary
evolution for human language. (BC p. 112)
BCs argument that apes lack any analogue of words is drawn from a single
casethat of the enculturated chimpanzee Nim Chimpsky, who was famously
taught sign language with very limited success. BC argue that Nim did not ever
learn the names of objects (e.g., apple), but only loose bundles of associations
(e.g., associating apple not just with apples, but with the knives used to cut them,
and so on). Moreover, they cite an analysis (Yang 2013) that reports that Nims
utterances lacked the combinatorial structure of human childrens utterances
suggesting that his gesture sequences were learned by rote.
While enculturated apes are known to produce gesture sequences (e.g., TICKLE
THERE), the claim that these lack syntactic structure is (or should be)
uncontroversialand has been replicated in cautious analyses of other enculturated
apes (Rivas 2005). However, while BC repeatedly claim that the emergence of
concepts in humans is mysterious, the claim that non-human communication lacks
word-like elements may be too quick. It is not implausible to claim that their
utterances are meaningfuland even word-likein the same respects as human
utterances are. For example, recent analyses of the gestural repertoire of wild
chimpanzees suggests that, while they do not use gestural signs to name objects,
they do use them in ways consistent with their possessing consistent semantic
properties (Hobaiter and Byrne 2014; see Moore 2014, 2016 for discussion). These
may cut the world at a cruder grain than do human conceptsbut, contra Davidson
(1982), this is no argument for thinking that they are not concepts at all. While
Rivas (2005) found the utterances he analysed to be largely consistent with cautious
analyses of Nims dataset, his discussion supports the conclusion that enculturated
apes (at least for the most part) use signs with relatively stable meanings. Given
that, its unclear that the lack of syntactic structure in ape communication is
attributable to conceptual shortcomings, and not just to their lacking combinatorial
abilities. Perhaps BC are after something more substantial herefor example, the
idea that only human concepts are sufficiently word-like to be bound hierarchically
by Merge.3 However, the argument for this claim is somewhat opaque, and would
(at least) have benefitted from a more detailed elaboration.

Thanks to an anonymous reviewer for this point.

R. Moore

With respect to syntax, the comprehension abilities of apes reared in human-like

conditions are also better than BCs discussion of Nim would suggest. For example,
Kanzi, an enculturated bonobo, has been argued to understand novel sentences of
English at around the level of a child of 2.5 years (Savage-Rumbaugh et al. 1998).4
At least with respect to comprehension, then, Kanzi seems to possess the flexibility
that Nims production lacks. Nonetheless, there are obvious biological constraints
on what Kanzi can do. His competence stopped developing just as young childrens
understanding of syntax soars; and there are systematic mistakes in his compre-
hension of some relatively simple sentences of English (Truswell in press).
With respect to both ape syntax and semantics, both more data and more rigorous
analysis are needed before strong conclusions can be drawn. But if the data I
describe above are not misleading, they would have implications for BCs story.
First, if we suppose that ape utterances (and so by extension those of the last
common ancestor of humans, bonobos and chimpanzees) possess word-like
semantic but not syntactic properties, then BCs claim that language arose with
just a few minor biological tweaks (including one for Merge and one for FOXP2)
becomes more credible. Indeed, if we take evidence of Kanzis ability to track
simple syntactic properties as evidence that he already possesses some crude and
limited precursor to Merge, then BCs claim that the generative procedure
emerged suddenly as the result of a minor mutation (p. 70) becomes more
compelling. This threatens to complicate another part of BCs story, however,
namely that Merge is the simplest possible computation. Progovac (2015) has
already argued that the evolution of Merge may have been more gradualistic than
BC permit.
Second (and again following Progovac 2015), if we think of our ancestors as
using signs fluently, but without structure (and supposing the sensorimotor system
to be at least largely in place), then we can see how selection pressure for a selective
sweep of genes coding for Merge might have been driven by communication and
not thought. Ancestors who were already coordinating fluently using holophrastic
and simple unstructured utterances (like those produced by Kanzi) might have
benefitted greatly following the appearance of a mutation that improved their ability
to combine signs, since that would enable them to coordinate in more complex
ways. Subsequently, refinements to human thought might then have been driven by
the emergence of new cultural tools for communicating. This possibility is also
supported by empirical evidence. For example, there is a wealth of data that shows
that childrens performance in explicit ToM tasks is connected to their mastery of
syntactic forms known as realis complement clauses (e.g., de Villiers and Pyers
2002; Lohmann and Tomasello 2003; Milligan et al. 2007).
This possibility could, potentially, be the seed of a more substantial objection to
BCs view that took aim at the assumed existence of a developed language of
thought prior to the acquisition of natural language. Rather than attempt that
objection here, though, I just note that I got a great deal out of reading BC, but raise

In discussion of Kanzis abilities, one should always be wary of anthropomorphismbut a compelling
video illustrating his comprehension of novel sentences can be found here:

The evolution of syntactic structure

one final gripe. My first edition of the book is rather poorly edited. There are
numerous missing references (e.g., Frank et al. 2012 is discussed on p. 115 but
missing from the bibliography). Four of the diagrams in the book are also printed
twiceonce in black and white, and a second time in a set of handsome coloured
plates. This is presumably because the black and white illustrations are unusable,
since the accompanying descriptions refer to coloured aspects that are indiscernible
in grey (e.g. Fig. 4.4 on p. 160). This suggests that the book was rushed to press.
Hopefully for future editions MIT will employ a proofreader.
I turn now to CC.
The central premise of CC is that the argument that language is made possible by
a hardwired Universal Grammar generates a logical problem of language evolution.
The problem, as they see it, is that
it is mysterious how proto-language which must have been, at least initially,
a cultural product likely to be highly variable both over time and location
could ultimately have become genetically fixed as a highly elaborate
biological structure. (BC p. 24)
As this passage suggests, the main target of CCS attack is the original
formulation of UG that takes it to be a highly elaborate biological structure; and
particularly the non-Chomskyan developments of this view (e.g., Pinker and Bloom
1990) that take this biological structure to be the result of natural selection.
Occasionally, CC extend their argument against UG to include its more recent
minimalist versions. However, when they do so, their general strategy is to hint that
minimalist UG is so far removed from its older self that it is no longer recognisable,
and has somehow betrayed UG first principles.
In what follows I will devote more time to elaborating the details of CCs view
that shed light on possible ongoing disagreements between CC and BC. Since the
precise targets of CCs criticisms are somewhat hazily defined, and since the bulk of
CC is devoted to a positive argument that language acquisition can be explained
without an innate faculty of language, adopting this strategy will not distort the
contents of the book.
In response to the logical problem that they identify, CC reject the claims that
UG is genetically encoded, and that our brains are specialised for language. Rather,
they argue, language reflects pre-existing, and hence non-language specific, neural
constraints (ibid.). What makes language learnable is not UG, but the combination
of general learning mechanisms and cultural selection.
According to CC, understanding cultural selection is key to understanding how
we can get by without a UG, because it enables us to give up the claim that our
brains are adapted for language, and to replace it with the much less controversial
claim that the languages we speak have been adapted to our pre-existing brains.
In order for languages to be passed on from generation to generation, they
must adapt to properties of the human learning and processing mechanisms.
(CC p. 44)

R. Moore

A consequence of this is that languages that are difficult to learn will be weeded
out over generations, as speakers abandon or modify them in favour of alternatives
that are easier to master. Therefore, according to CC
the learnability of language is not a puzzle demanding the presence of
innate information, but rather an inevitable consequence of the process of the
incremental creation of language and culture more generally, by successive
generations. (p. 74)
Additionally, CC offer a range of short arguments against the various accounts of
UG that have been developed. With respect to the original Chomskyan view that
UG that arose by non-adaptationist means, they throw up the following dilemma. As
traditionally conceived, UG is a large and complex set of abstract rules. For these to
have arisen by chance would be infinitesimally low (p. 39). However, if they
emerged not by chance, but by virtue of being exapted from other cognitive
processes, then there is no reason to think they should be language specific in the
way that proponents of UG have claimed. In that case, a plausible story about
evolution cannot be told for anything remotely resembling the traditional picture of
UG (ibid.). Proponents of UG therefore face a dilemma: abandon UG orthodoxy,
or give up on an evolutionarily plausible account.5
The bulk of CC is dedicated not to criticisms of the nativist paradigm, but to the
development of the positive view that language learning requires no language-
specific neural substrates. A central tenet of this is that the neural substrates of
language can be explained in terms of a process of neural recycling, which they
explain as follows:
Instead of viewing various brain regions as being dedicated to broad cognitive
domains such as language, vision, memory, or reasoning, it is proposed that
low-level neural circuits are redeployed as part of another neuronal network to
accommodate a new function. (p. 44)
The central goal of the second half of CC is to develop the details of this view, in
order to show that, even in the absence of any UG, a domain-general sequence
learner can acquire aspects of syntactic structure (p. 157) via the integration of
multiple linguistic cues and statistical predictions about the relative frequencies
with which different words are combined.
At the heart of CCs positive account of language processing lies a problem that
they propose to solve. The problem, as CC conceive it, is that in both acquisition
and later use, units of language must be processed in real time in order to be
functional. This creates a Now-or-Never bottleneck, which arises from general
principles of perceptuo-motor processing and memory (CCp. 94) but provides
strong constraints on the sorts of things that can function as units of language. Units
of input that are too difficult to be processed quickly and efficiently will be

CC also offer arguments against gradualist versions of UG, but for reasons of space I will not discuss
them here.

The evolution of syntactic structure

The bottleneck is made manageable via a Chunk-and-Pass processing

procedure. According to CC the elements of linguistic input can be processed
rapidly, because at each functional level the language system codes successive units
of input into chunks. These chunks are then passed to a higher level of linguistic
representation, and the resultant larger chunks are passed up again. Thus, phonemes
are chunked into morphemes, morphemes into words, words into sentences, and
sentences into discourse. While un-structured information might overwhelm the
system, it is possible to learn to rapidly encode, and recall, long random sequences
of digits by successively chunking such sequences into larger units, chunking those
chunks into larger units, and so on. (CC p. 98). At each level of the processing
hierarchy, chunked information is integrated with further prosodic, semantic,
syntactic and pragmatic cues, combined with background knowledge. The
integration of these cues helps the linguistic system to disambiguate a speakers
likely communicative goalsfor example, by helping it to anticipate how long
branching sequences are likely to be resolved, and so rapidly resolving potential
On this view, to acquire a language is just to learn to process progressively more
complex chunks of linguistic input. CC argue that the skills for doing this are
domain generalin the form of learned statistical information and probabilistic
cuesand are built up by practice. The most obvious consequence of this is that the
ability to process particular linguistic inputs will be a consequence not of any
hardwired syntactic knowledge, but of language users familiarity with relevantly
similar forms. Where syntactic forms are used more frequently, then learning will be
faster; and where structures are regular, then language users will be able to
generalise more easily from previously experienced cases to novel ones. Thus, on
the back of a story about a psychological mechanism for parsing utterances (which
does not seem to be in conflict with BCs story about the emergence of syntactic
competence), CC construct a non-nativist account of syntax mastery (which more
clearly is).
According to CC, the nature of the Chunk-and-Pass processing system means that
some types of linguistic input will be more easily learnable than others. For
example, they argue that there should be a general (although not absolute)
preference in languages for local dependencies between sentential elements,
because these can be chunked more efficiently. They point to evidence from
language development that supports their view. For example, Kidd and Bavin
(2002; CC p. 180) found that childrens mastery of centre-embedded relative clauses
emerges during childhood. By four years of age, children have a reasonable mastery
of right-branching (1) subject and (2) object clauses like the following:
1. The kangaroo stands on the goat that bumped the horse.
2. The horse jumps over the pig that the kangaroo bumped.

However, at the same age found it much harder to grasp centre-embedded (3)
subject and (4) object relative clauses.
3. The cow that jumped over the pig bumps the sheep.
4. The sheep that the goat bumped pushes the pig.

R. Moore

This developmental lag would support both the existence of the Chunk-and-Pass
processing system, and the proposal that syntax mastery is bound up with
Only once chunking becomes faster and more efficient as a function of
repeated experience do children become better at processing centre-
embedded relative clauses. (CC p. 180)
This item-based approach to language learning might seem to threaten the
possibility of learning for exactly the sorts of reasons that first lead Chomsky to
posit the existence of UGnamely the poverty of the stimulus argument.
However, CC argue, this worry is groundlessbecause learning is facilitated in a
number of different ways. First, much language comprehension is over-deter-
minedbecause the proper interpretations of linguistic units are cued by multiple
sources. For example, the differences between nouns and verbs are cued not only by
syntactic properties, but by semantic and prosodic ones too, and that consequently
domain general simple recurrent connectionist machine learning networks (SRNs)
can learn some aspects of syntactic structure (CC pp. 157). Second, because over
time chunks of language are themselves subject to cultural evolution pressures,
linguistic units that cannot easily be processed will be abandoned or refined, one
item at a time. Over successive generations, language users will have selected and
retained linguistic structures the comprehension of which is over-determined and
which are consequently easily learned.
The final chapters of CC culminate with an argument that builds on two claims
already developednamely, that syntax processing involves domain general
abilities, and improves with experience. CC argue that recursion is also learned.
Recursion is the ability that allows the reuse of the same grammatical construction
multiple times in a given sentence (CC p. 197), and which enables us to produce
and understand sentences like Richard knows that Kim knows that Richard knows
this essay is overdue. It is closely linked to the computation that BC call Merge.6
However, in contrast to UG accounts, according to CC
the ability to process recursive structure does not depend on a built-in property
of a competence grammar, but, rather, is an acquired skill learned through
experience with specific instances of recursive constructions and limited
generalizations over these. (CC p. 203)
According to CC, what makes recursion possible is general purpose mechanisms
for sequence learning. Whereas BC argue that the evolutionary significance of
FOXP2 is limited to externalisation, CC maintain that changes to the human variant
made possible skills for sequence learning that ancestors had lacked. In support of
this claim (and among other things) they point to evidence showing both that mice
genetically modified with a human variant of the gene show improved sequence
In an earlier version of Chomskys minimalist UG (Hauser et al. 2002), recursion was thought to play
the same role that Merge does for BC. Despite the obvious continuity between Hauser et al. (2002) and
BC, in BC it is not entirely clear whether Merge is identical to recursion, or just a necessary precondition
for it.

The evolution of syntactic structure

learning of actions (e.g., Schreiweis et al. 2014), and that FOXP2 translocations in
humans have been associated with both language and sequence learning deficits.
Thus, they suggest, FOXP2 was selected for general-purpose sequence learning in
humans, and was only later recruited for language.
Two final points of disagreement with BC are worth mentioning. First, on UG
approaches to language comprehension, the limitations of language users to process
recursively complex sentences is taken to be the consequence of working memory
constraints that limit the function of the innate recursion mechanism. Thus failures
are attributed to performance issues, and not any problem of underlying
competence. By contrast, CC deny the performance-competence distinction, as
Chomsky developed it. For them, and in both ontogeny and phylogeny, linguistic
competence develops only as a function of performance. Thus, CC make their case
for a domain general account of language acquisition.
I found a great deal to admire in CC, but it was often a frustrating read. This is
not because I am unsympathetic to their project (on the contrary), but because the
book suffers from a number of flaws. First, as I mentioned previously, the precise
targets of many of their arguments can be poorly defined. To give one illustration,
the logical problem that CC identify with Chomskys UG targets the evolution of
syntax in natural languages. However, since BCs argument for nativism concerns a
language of thought, CCs objection does not obviously engage with it. Addition-
ally, arguments against the various versions of UGnon-adaptationist, adaptation-
ist, and MP versionsare also not always distinguished carefully, such that what
CC offer is less a considered criticism of the details of rival views, but a fairly fast
and loose rejection of a whole program of thought. This is a shame, both because
they make some very important points (for example, about the role of cultural
evolution), and because many of the points that they raise have already been
conceded by those whose views they are criticising. For example, BC would simply
agree with the claim that the original version of UG could not have evolved under
natural selection pressure. Since CC was being written before BC was published,
CC cannot be held accountable for not having responded to Chomskys most recent
formulations of his view. However, the views defended in BC are not new. They
build on foundations that have been in place for over ten (Hauser et al. 2002) and
twenty (Chomsky 1995) years, respectively. While these older works are discussed
in the text, the bulk of argument is directed against much earlier formulations of the
nature of UG.
Second, CCs positive account is built on metaphors that are superficially
compelling but that make the details of their positive claims very difficult to discern.
For example, the Chunk-and-Pass processing system combines elements that, on
BCs view, would be handled by both the sensorimotor system (for example, in the
computations that lead groups of phonemes to be recognised as words) and Merge
(as when words are bound into phrases and sentences). Thus, for CC the operation
of Chunking-and-Passing is doing an extraordinary amount of work. However, the
nature of Chunk-and-Pass processing is only ever sketched, and beyond the claim
that it recruits only domain general mechanisms, we are told next to nothing about
how it works. In that case, their central claim turns out to be largely a statement of
faith: there will turn out to be no syntax-specific neural processes involved in

R. Moore

chunking basic units of linguistic input into syntactically structured discourse. This
is a bold claim. However, while the data CC present suggest that domain general
processes might achieve far more than was once thought, they are a long way from
demonstrating that no language specific functions are needed.
Its also hard to escape the feeling that in some parts of the book CC are tacitly
appealing to something that looks a lot like Merge, built into the structure of their
Chunk-and-Pass processor. One feature of Merge that CC seem not to want to adopt
is its internal structure: elements bound by Merge are bound hierarchically, and not
merely additively. However, the possibility of a Chunk-and-Pass processing system
that was not organised in this way is difficult to explainbecause if there is no
hierarchical structure at work in basic chunking functions, its unclear why some of
the elements of a sentence should be chunked together and not others. BC make this
point forcefully, arguing that non-hierarchical comprehension cannot explain simple
and uncontroversial cases of binding. They give the example of the following
5. Birds that fly instinctively swim.
6. Instinctively birds that fly swim.

BC argue that (5) is ambiguous, since instinctively could modify either of the
verbs. By contrast, (6) is unambiguous. Here instinctively clearly modifies swim
and not fly. Their view can explain this on the grounds that instinctively is
hierarchically closer to swim than fly, as below (taken from Fig. 4.1 on BC
p. 117):

They argue that a non-hierarchical binding system cannot explain why sentence
(6) is not ambiguous in the way that they describe.
There may be a rejoinder that could be developed in line with CCs view.
However, CCs anticipation of the objection is weak. In a very short explanatory
text box that is not part of the main body of the text (p. 112), we are told that real-
time chatbots like Apples Siri do not use syntactic trees to parse commands, but
rely on probabilistic pattern matching with respect to individual words, and that
consequently language processing in the here-and-now can get by without
hierarchical syntactic structure (ibid.; see also Frank et al. 2012 for further
discussion). This aside provides no grounds for responding to BC. However, BCs
argument is fundamental to their rejection of the plausibility of domain general
sequence learning processes. A better responseconsistent with CCs argument

The evolution of syntactic structure

that language learning is over-determinedwould be to point out that, at least in

BCs example, differences in intonation would suffice to disambiguate possible
interpretations of the sentence at least in speech.7 However, this solution may not
generalise to all cases of hierarchical binding.
Where CC really achieve is in synthesising a wealth of recent empirical data on
the learnability of languages. The huge amount of data presented, largely drawn
from computational models and developmental psychology, is impressive and never
less than clearly explained. Unfortunately, this synthesis is also sometimes
undermined by a theoretical looseness that makes it hard to discern real sources
of disagreement between the view that CC defend and more recent accounts of UG
and its relation to evolution. I want to use the final section to get clearer about where
remaining disagreements might lie.
As I previously mentioned, some of the early UG commitments criticised by CC
are also rejected by BC. For example, at the outset of their book, BC concede that
the richer the UG, the greater the burden on evolvability (BC p. 96). Their
dropping of the claim that UG is large and complex exempts their view from the
bulk of CCs worry that UG could not evolve. In contrast to CC, BC do still
maintain that UG is language specific, and this is at least a superficial source of
disagreement. However, the nature of disagreement here must be stated carefully
because of the different ways in which BC and CC use the word language.
Recall that BC think that language, in the form of the conjunction of Merge and
the conceptual interface, is an element of thought that underwent natural selection
for improvements in planning. While for them language is independent of
communication, for CC language is just the set of natural languagesand their
use of the word reflects this. Given their view, BC would not expect that areas of the
brain involved in natural language use would be used only for natural language; they
will be used for general purpose thinking too. This makes their thesis less easy to
distinguish from CCs neural recycling claim. For example, in developing their
claim that cultural selection shapes language to fit the pre-existing brain, CC write
that elements of sentential binding will be in part, determined by innate
constraints, but those constraints pre-date the emergence of [natural] language
(CC p. 83). This claim is reminiscent of BCs view that binding constraints on
natural languages are by-products of constraints on the neural implementation of
Merge, and did not strike me as something with which BC need disagree. Evidently
both BC and CC hold that our brains are better suited to some possible natural
languages than others. Moreover, while BC do not attribute any substantial role to
cultural evolution in the development of natural languages, they could. Indeed, BCs
claim that humans underwent an adaptation for Merge, is entirely consistent with
CCs view that particular natural language forms are shaped by cultural selection
processes. Here, then, any apparent disagreement would seem merely to be a
difference of emphasis.
This suggests that the relationship between constructionist and UG approaches to
language development needs to be reconsidered. At the outset, I stated that
historically disagreement crystalised around two issues(1) the question of whether

I owe this point to an anonymous reviewer.

R. Moore

syntax is grounded in language specific abilities, and (2) the question of whether or
not these abilities were the product of natural selection. With respect to (1), at least
if language is understood in terms of natural language, then neither BC nor CC
hold that syntax is language-specific. Further, with respect to (2), BC and CC agree
that syntactic abilities underwent natural selection, and that they did for functions
independent of any benefit for communication. Wherein they differ is that for BC it
was thought and planning that drove the selective sweep for Merge; for CC it was
general sequence learning abilities. These alternatives may not be mutually
exclusive though.
The major difference between BC and CC now concerns only the question of
whether general-purpose, non-hierarchical sequence learning mechanisms can
perform the single binding function that BC think is performed by Merge. CC dont
offer a detailed argument for thinking that it could. However, their book offers a
compelling illustration of the growing power of non-specialised machine learning
tools, and it is possible that they will be vindicated in time. In that case, this
disagreement should be settled by empirical developments.
In the meantime, I hope to have shown that the constructivist and UG approaches
to language development have a great deal in common. If that is right, then perhaps
the time of dichotomising rhetoric can be over.

Acknowledgements For helpful comments on the first draft of this essay, I would like to think Cameron
Buckner, Bryce Huebner, and one anonymous referee.


Chomsky N (1965) Aspects of the theory of syntax. MIT Press, Cambridge

Chomsky N (1995) The minimalist program. MIT Press, Cambridge
Davidson D (1982) Rational animals. Dialectica 36(4):317327
de Villiers JG, Pyers JE (2002) Complements to cognition: a longitudinal study of the relationship
between complex syntax and false-belief-understanding. Cogn Dev 17(1):10371060
Fodor J (2008) LOT 2: the language of thought revisited. OUP, Oxford
Frank SL, Bod R, Christiansen MH (2012) How hierarchical is language use? Proc R Soc B
Hauser MD, Chomsky N, Fitch WT (2002) The faculty of language: what is it, who has it, and how did it
evolve? Science 298(5598):15691579
Hobaiter C, Byrne RW (2014) The meanings of chimpanzee gestures. Curr Biol 24(14):15961600
Jackendoff R (2002) Foundation of language: brain, meaning, grammar, evolution. OUP, New York
Kidd E, Bavin EL (2002) English-speaking childrens comprehension of relative clauses: evidence for
general-cognitive and language-specific constraints on development. J Psycholinguist Res
Krause J, Lalueza-Fox C, Orlando L, Enard W, Green RE, Burbano HA, Hublin JJ, Hanni C, Fortea J, De
La Rasilla M, Bertranpetit J (2007) The derived FOXP2 variant of modern humans was shared with
Neandertals. Curr Biol 17(21):19081912
Lohmann H, Tomasello M (2003) The role of language in the development of false belief understanding:
a training study. Child Dev 74(4):11301144
Maricic T, Gunther V, Georgiev O, Gehre S, Curlin M, Schreiweis C, Naumann R, Burbano HA, Meyer
M, Lalueza-Fox C, de la Rasilla M (2013) A recent evolutionary change affects a regulatory element
in the human FOXP2 gene. Mol Biol Evol 30(4):844852
Milligan K, Astington JW, Dack LA (2007) Language and theory of mind: meta-analysis of the relation
between language ability and false-belief understanding. Child Dev 78(2):622646
Moore R (2014) Ape gestures: interpreting chimpanzee and bonobo minds. Curr Biol 24(14):R645R647

The evolution of syntactic structure

Moore R (2016) Meaning and ostension in great ape gestural communication. Anim Cogn 19(1):223231
Pinker S, Bloom P (1990) Natural language and natural selection. Behav Brain Sci 13(04):707727
Progovac L (2015) Evolutionary syntax. OUP, Oxford
Pullum GK, Scholz BC (2005) Contrasting applications of logic in natural language syntactic description.
Logic, methodology and philosophy of science. In: Proceedings of the twelfth international
congress, pp 481503
Rivas E (2005) Recent use of signs by chimpanzees (Pan troglodytes) in interactions with humans.
J Comp Psychol 119(4):404
Savage-Rumbaugh ES, Shanker S, Taylor TJ (1998) Apes, language, and the human mind. OUP, Oxford
Schreiweis C, Bornschein U, Burguiere E, Kerimoglu C, Schreiter S, Dannemann M, Goyal S, Rea E,
French CA, Puliyadi R, Groszer M (2014) Humanized Foxp2 accelerates learning by enhancing
transitions from declarative to procedural performance. PNAS 111(39):1425314258
Truswell R (in press) Dendrophobia in bonobo comprehension of spoken English. Mind & Lang
Yang C (2013) Ontogeny and phylogeny of language. PNAS 110(16):63246327