Beruflich Dokumente
Kultur Dokumente
Wide allele-frequency clines exist a t sev- gies of Mesolithic and Neolithic settlements
eral loci in Europe (Menozzi et al., 1978; indicate a westward spread of farming tech-
Sokal and Menozzi, 1982; Sokal et al., nologies from the Near East, starting ap-
1989a). Their extent is such that simple proximately 8,000 BC (Ammerman and Cav-
models of isolation by distance are unlikely alli-Sforza, 1984; Renfrew, 1987). The
to explain them. It is generally agreed that observed gene-frequency gradients can then
they result from a population expansion be explained by attributing the propagation
starting in Anatolia approximately 10,000 of farming t o the dispersal of early farmers,
years ago (Menozzi et al., 1978; Sokal and
Menozzi, 1982; Renfrew, 1987, 1991, 1992;
Cavalli-Sforza, 1988; Sokal et al., 1991),
Received January 25,1994; accepted August 7,1994
most likely associated with the development
Address reprint requests to Robert R. Sokal, Department of
of technologies for food production (Hassan, Ecology and Evolution, State University of New York, Stony
1973; Zeven, 1980). Radiocarbon chronolo- Brook, NY 11794-5245.
who interbred only sparingly with the bands groups whose current languages do not be-
of hunters and gatherers whom they met in long to the Indo-European phylum.
the process, and who had already colonized Recent years have seen both a refinement
most of Europe (Ammerman and Cavalli- of the hypotheses on Indo-European origins,
Sforza, 1984). Such a combination of demo- and the emergence of contradictory data.
graphic growth, range expansion, and lim- Based on archaeological and linguistic evi-
ited admixture has been termed demic dence, Cavalli-Sforza (1988) and Renfrew
diffusion (Menozzi et al., 1978). Its expected (1991, 1992) argued that the common ele-
consequences include correlations between ments recognized within the so-called Nos-
allele frequencies and the dates of onset of tratic linguistic macrofamily (Kaiser and
agriculture (Sgaramella-Zonta and Cavalli- Shevoroshkin, 19881, including Indo-Euro-
Sforza, 1973), which have actually been ob- pean, Altaic, Afro-Asiatic, and Elamo-Dra-
served (Sokal et al., 1991). vidian, derive from a common biological ori-
The European regions where early (Neo- gin of most of their current speakers. It is
lithic) agriculturalists expanded correspond then possible to interpret the current distri-
approximately to the current western range bution of most Nostratic, and not only Indo-
of Indo-European languages. This raises the European, languages as a consequence of a
question whether the cultural process of multidirectional spread of agriculture. Re-
Indo-European diffusion was also deter- cent genetic analyses agree with this view
mined by the demographic processes accom- (Cavalli-Sforza et al., 1993; Barbujani and
panying the spread of farming (Renfrew, Pilastro, 1993). On the other hand, a model
1987). The traditional view holds, on the incorporating the effects of the origin of agri-
contrary, that proto-Indo-European entered culture and/or specifically the hypotheses of
Europe not earlier than 4500 BC, through Renfrew and Gimbutas failed to explain a
three migrational waves also coming from larger fraction of the correlations between
the East, namely from the Pontic steppes genetic and linguistic distances than is ex-
(Gimbutas, 1979, 1986). Although directed plained by the simple effects of geographic
westwards, like demic diffusion, these distances (Sokal et al., 1992).
waves were not associated with population Further evidence in favor of either hy-
increases comparable to those caused by the pothesis may be obtained by simulating
introduction of farming and animal breed- their genetic consequences, and then com-
ing. Therefore, diffusion of Indo-European paring them with the patterns of genetic
in the late Neolithic would imply that lan- variation observed in the field. In the only
guages spread more through cultural con- simulation study available so far, present-
tacts (Zvelebil and Zvelebil, 1988) than by time genetic variation was demonstrated to
demographic processes (see Renfrew, 1989). be compatible both with a Neolithic origin of
As a corollary t o this, association between Indo-European speakers, and with later im-
patterns of genetic and linguistic variation migration from the Pontic steppes (Rendine
should be limited and occasional among con- et al., 1986). However, the two hypotheses
temporary Indo-European speakers. were not contrasted in that study, but com-
Support for the view linking the spread of bined. A relationship between current gene-
proto-Indo-European in Europe with demic frequency gradients and dispersal from the
diffusion comes from studies showing that east was evident, and seems undisputable.
patterns of linguistic and genetic diversity However, when exactly, and by what type of
correspond in many European populations process, proto-Indo-Europeans spread has
(Sokal, 1988; Cavalli-Sforza et al., 1988, not been convincingly ascertained. We are
1992; Harding and Sokal, 1988; Barbujani particularly interested in establishing if the
and Sokal, 1990, 1991; Bertranpetit and demic diffusion of early farmers can, at least
Cavalli-Sforza, 1991). The exceptions in- in principle, explain a large share of current
clude Basques (Piazza et al., 1988; Bertran- genetic diversity among Indo-European
petit and Cavalli-Sforza, 19911, Hungarians speakers, or if additional processes must be
(Barbujani et al., 1990), and Uralic-speakers included in the model to obtain a better fit
(Guglielmino et al., 19901, that is to say, with real data. Among these additional pro-
INDO-EUROPEAN ORIGINS 111
cesses, we gave a special emphasis to the models lie in parameters other than those
migratory waves postulated by Gimbutas. described so far.
In this study, we simulated microevolu- Each simulation experiment consisted of
tionary scenarios of increasing complexity, 440 iterations (representing generations) of
from an unrealistically simple one to models a series of population processes. In this way,
including several archaeologically docu- assuming a 25-year generation interval and
mented migrations. We then calculated cor- non-overlapping generations, each experi-
relation coefficients between the simulated ment spans 11,000 years. The simulated al-
and real gene frequencies (or, more pre- lele frequencies were printed at generation
cisely, between matrices of genetic distances 440, representing present time, for the local-
calculated from each of them). We expected ities corresponding to those for which allele-
to observe an increasing agreement between frequency data are available in a database of
real and simulated data as the models get European allele frequencies (Sokal et al.,
more and more realistic. When an increase 1989a), from which non-Indo-European
in the complexity of the model is not speakers had been discarded. Matrices of
matched by an increase in the correlations, Prevostis genetic distances (Prevosti et al.,
we conclude that the new factors included in 1975) were calculated on both real and sim-
that model do not improve our understand- ulated allele frequencies, and their degree of
ing of the phenomenon, and should there- resemblance was evaluated by Mantel
fore be considered unnecessary. This does (1967) tests of matrix correlation.
not automatically imply that these factors Five FORTRAN programs were written,
played no evolutionary role at all. But, if each corresponding to one of five models for
they did, evidence of their effects should be the origins of Indo-Europeans described in
sought in data other than the currently the following section, and incorporating sub-
available allele frequencies. A large data- routines developed by Press et al. (1986)and
base of European allele frequencies (see Manly (1991).
Sokal et al., 1989a) was analyzed for this
purpose. An outline of the models
IBD: Isolation by distance
METHODS AND DATA
The first, clearly oversimplified, hypothe-
Overview of the simulations sis, is that Indo-European-speaking popula-
We carried out a series of computer simu- tions evolved under conditions of isolation
lations of five microevolutionary models. All by distance (IBD model). Current patterns
models were based on a stepping-stone pop- of genetic variation would then simply re-
ulation structure, consisting of a 60 X 37 sult from the interaction between random
regular lattice, superimposed onto the map fluctuations of allele frequencies in time,
of Europe. Each node of the lattice rep- i.e., genetic drift, and dispersal of individu-
resents a l-degree square quadrat. Of the als. Under isolation by distance, variations
2,220 nodes of the lattice, 1,512 are land in population size affect only the impact of
areas supporting a human population. Each genetic drift-the larger the population, the
node (population) is characterized by its ef- smaller the allele-frequency fluctuations.
fective size (N,) and by the frequency of one Population growth, which occurs after gen-
allele (p).At each generation p undergoes eration 40, does not prompt migratory move-
random variation, representing the effects ments. Thus, the IBD model neglects all
of genetic drift, which is a function of N,. gene flow processes other than those in
Migration is allowed only between adjacent which movements of individuals from their
populations. The numbers of individuals mi- birthplaces are local and random (i.e.,
grating at each generation depend on popu- equally likely in all directions except for mi-
lation sizes, and on factors of resistance to gration resistance factors, see below).
migration, which are zero across plains, but Under the IBD model, the demographic
greater than zero across mountain chains increase that occurred in the Neolithic (and
and seas. The differences among the five is detailed in the section Population Growth
112 G. BARBUJANI ET A L
Among Farmers) was simulated without OAC: Isolation by distance, plus effects
separating hunting-gathering and farming of the origin of agriculture, and cultural
populations, i.e., as if all hunter-gatherers transmission
turned to agriculture a t 8,000BC.
Cultural transmission from farmers to
hunter-gatherers may be built into the
model, yielding what we call OAC; C stands
OAG: Isolation by distance plus effectsof for culture. Under this model, at all locali-
the origin of agriculture ties some hunter-gatherers learn how to
produce food, and therefore their alleles are
Isolation by distance is the null hypothe-
transmitted across generations with greater
sis for human microevolution (Wijsman and
efficiency. From the genetic standpoint, this
Cavalli-Sforza, 1984). Therefore, all models
is equivalent to a certain degree of admix-
that follow are not alternative to IBD.
ture, whereby some genes of the hunter-
Rather, they incorporate it as a necessary, if
not sufficient process, for determining the gatherers contribute to the gene pool of the
farmers. As a consequence, these genes
currently observed patterns of genetic varia-
spread at once with the genes of the farmers,
tion. OAG, the simplest such model, is one
and are thus carried into new localities. This
which combines IBD with the likely effects
is the Neolithic demic diffusion model, as
of the demographic processes following the
originally proposed (Menozzi et al., 1978;
origin of agriculture. Under this model, pop-
Renfrew, 1987).
ulations of hunter-gatherers initially occupy
Europe and evolve under isolation by dis-
tance. At a specific moment in time (8,000
BC, chosen on the basis of archaeological in-
ATC: Isolation by distance, plus effects
formation), a few populations in southern of the origin of agriculture, cultural
Anatolia turn to farming. This starts a local transmission, and archaeological
process of population growth in the areas time constraints
where farming is being practiced, followed
by dispersal outwards when local population Under OAC, the spread of farmers from
densities have reached a certain threshold. Anatolia into Europe is driven by their in-
In this way, migratory movements between crease in numbers at each locality, which
farming communities are not necessarily causes dispersal towards areas of lower pop-
symmetrical, as is reasonable to assume in ulation density. Therefore, in the OAC
many evolutionary scenarios (Rogers and model, the farming technologies spread a t
Jorde, 1987). The rate of spread of farmers is an approximately constant rate through
driven by their intrinsic growth rate; it is space (as in Ammerman and Cavalli-Sforza,
constrained only by geographical factors 1971). This is known to be an approximation
such as mountain chains or bodies of water. (Barker, 1985).A further refinement of OAC
No cultural transmission is simulated be- considers archaeological time constraints
tween the hunter-gatherers and the farmers (ATC). Under ATC, we use archaeological
who immigrate into their regions. Only the information about the likely date at which
farmers allele frequencies are eventually farming reached each specific site in Europe
compared with observed matrices of genetic (see Sokal et al., 1991). In this way, the
distances. In this way, the genetic conse- arrival of farmers into a new locality re-
quences of this model are those that would flects archaeologically documented cultural
be expected if hunter-gatherers were re- transformations; farmers spread at an irreg-
placed without admixture, i.e., became ex- ular rate, corresponding to the actual pro-
tinct. The only microevolutionary role they cess as inferred from archaeological evi-
play is to serve as a source population at the dence. Incorporation of hunter-gatherers
beginning of the Neolithic, for that small into each farming population occurs at the
fraction in Anatolia of the total population same rates and through the same processes
that develops the new farming technologies. as in OAC.
INDO-EUROPEAN ORIGINS 113
GIM: Isolation by distance, plus effects of ing the frequency of an allele at a polymor-
the origin of agriculture, cultural phic locus in the hunting-gathering popula-
transmission, archaeological time tion, i.e., in the only type of population
constraints, and late Neolithic existing at the beginning of the simulation.
migmtions From a mathematical standpoint, it does not
make a difference if this locus is regarded as
In the OAG, OAC, and ATC models, the biallelic, or if it is considered multiallelic,
first farmers are also considered the first
since the fate of only one of its alleles is
speakers of proto-Indo-European. The alter-
simulated in the followingphases. Allele fre-
native hypothesis considers them as the Ne-
quencies were drawn from a gamma distri-
olithic inhabitants of areas that were later
bution truncated at one (Nei, 19871, whose
invaded by the first proto-Indo-European
mean was fixed either a t 0.33 or at 0.50.
speakers, the Kurgan people (Gimbutas,
1979, 1986). The three migrational waves
postulated by Gimbutas are added to ATC in Initial population sizes
the GIM model, by simulating long-distance Estimates of population densities among
migratory movements between 4,250 BC and current hunting-gathering tribes suggested
2,900 BC. A number of successive population to Rendine et al. (1986) that the effective
movements are added as well; presumably, size NHG of the hunting-gathering popula-
they were independent from the spread of tions of Europe should be approximately 300
Indo-European, but are considered by Gim- in each of the 840 elementary areas of their
butas (personal communication) relevant to simulation. To have the same population
an accurate description of human evolution density in the 2,220 pixels of this simula-
in Europe. tion, NHG was fixed at 114. This corresponds
to a population density of 0.04 individuals
Details on the simulation parameters per square km,within the estimated range
and algorithms of population densities for hunter-gatherers
in temperate climates (Hassan, 1981). In
The data matrix Rendine et al.s (1986) model, the individu-
In all simulation cycles, a matrix of 60 als were considered as haploid, whereas
columns by 37 rows was defined, each ele- here they are diploid. This may have caused
ment in the matrix representing a square of a certain degree of divergence between the
edge length 1 degree in a Mercator projec- two models, as the drift variances are af-
tion of Europe. The data matrix covers the fected by the levels of ploidy of a population.
area between 10 degrees of longitude West
and 50 degrees East, and between 72 and 35 Genetic drift among hunter-gatherers
degrees of latitude North. Iceland is not in- Non-overlapping generations were simu-
cluded in this simulation. lated. At each generation, a new allele fre-
quency was drawn, for each locality, from a
Geography normal distribution whose mean was the al-
lele frequencyp of the same population a t
Each of the 2,220 elements (= nodes or the previous generation, and whose vari-
pixels or localities) of the data matrix con- ance w a s p 0 - p), divided by twice the effec-
tained an integer value, L, which was 1 for tive population size NHG (Nei, 1987). This
plains, 2 for mountains, 3 for seas, and 4 for represented the effect of sampling of alleles
the Black Sea. A local population was as- from one generation to the following, i.e.,
signed to each of the 1,512 land pixels. random genetic drift.
TABLE 1. Factors of resistance to migration (RTW tion 40; in this case, the number of migrants
And an adjacent Between a pixel in the- per generation increased accordingly. When,
pixel in t h e Plains Mountains Sea Black Sea by contrast, hunter-gatherers decreased in
Plains 0.00
numbers owing to the expansion of farmers,
Mountains 0.25 0.50 NAB) decreased as well until it reached
Sea 0.45 0.70 0.90 zero. Because of the RTM factor, two locali-
Black Sea 1.00 1.00 1.00 1.00
ties in the plains exchanged freely one-
fourth of the migrants allowed at each gen-
eration, whereas dispersal was reduced
lowing drift; population sizes of the two lo- between populations separated by moun-
calities exchanging individuals did not tains or bodies of water, and for populations
change as a result of dispersal. Therefore, a t the extremes of the simulated area. No
this study assumed a stepping-stone model dispersal was allowed across the Black Sea,
(Kimura and Weiss, 19641,whose properties to more carefully represent the population
are discussed by Jorde (1980). We chose a processes in the surrounding area (e.g., the
dispersal rate m of 0.065 (as in Rendine et migrations of Kurgan people), which oc-
al., 1986), which means that at each genera- curred mostly by land movements.
tion, after reproduction, 6.5%of the resident The allele frequency after migration,
pfHG, was then calculated as
individuals could be replaced by immigrants
from the adjacent pixels. However, physical -
obstacles in the pixels between which dis- P'HG = P N G [ ( ~- m') (1 - RTMI
persal occurred could reduce the number of + rn'PHGin (2)
migrants. Physical obstacles t o migration
where RTM is the average resistance to mi-
were expressed by a factor of resistance to
gration between the pixel of interest and the
migration, RTM, detailed in Table 1. The
nA adjacent pixels (1 < nA < 41, and pHGin
average value of RTM, calculated between
is
all suitable pairs of pixels, was 0.300. This
means that, on the average, only 70% of the
potentially dispersing individuals actually
moved from their birthplace to an adjacent
locality. To compensate for this, the dis-
persal rate m was replaced by m' = 0.0651 Inception of farming in the nuclear zone
0.700 = 0.0928. In other words, 9.28%of the
individuals in a locality were potentially Under all models except IBD, the spread
subject to migration elsewhere, and 6.5%ac- of farming starts with the splitting of some
tually migrated, on the average. The num- populations into two groups, one practicing
ber N(AB) of individual hunter-gatherers agriculture, and the other still living in a
moving from locality A t o B (and vice versa, hunting-gathering economy. At generation
from B to A) was 40 (i.e., 10,000 years ago), 20 individuals
turn to farming at each of six pixels in Ana-
tolia, around the village of Catal Humk,
N(AB) = NHGm' (1 - RTM(AB)) (1) where the oldest archaeological evidence of
4
~
the simulations. If the number of immi- tion 200, however, the four models yield a
grants from each locality is taken into ac- similar pattern of land occupation. The only
count in the calculation of the allele fre- major exception is an area of north-western
quency of the immigrants, which we called Alps, where archaeological evidence shows a
pHGinin Equation 2 and which will be called delayed onset of farming activities (Sokal et
pFinfor farmers, a n analogous formula gives al., 1991). Of course, such a delay was not
the allele frequency after gene flow among predicted by the mechanism of farming
farmers: expansion underlying our OAG and OAC
models.
p'F = p F [ ( l - m ' ) (1 - RTM)I
+ rn'PFin (5) Pixels in the sea
Figure 1 is an example of allele frequencies A stepping-stone model does not allow for
generated under the OAG and OAC models. long-distance population movements, i.e.,
Allele frequencies of specific localities were, those associated with sailing, which are con-
of course, different in different realizations sidered important in the colonization of Eu-
of the same process, and between OAG and rope, and in the successive phases of agricul-
OAC. What was constant, however, was the tural dispersal (Renfrew, 1987). To simulate
pattern of occupation of land areas by ex- the effects of the movement of a few individ-
panding farmers, because it depended only uals across the seas, we chose to assign a
on population growth and dispersal parame- pseudopopulation to each pixel located in
ters, which were kept constant across real- the sea. Pseudopopulations did not undergo
izations. random fluctuation of allele frequencies,
For the ATC and GIM models, by contrast, and did not increase in numbers. They in-
the spread of farming followed the pattern cluded only the individuals dispersing from
that can be inferred from archaeological evi- neighboring pixels (their number was deter-
dence. Once a farming community exists at mined according to Equation 1)) whose de-
a certain locality, the exchange of genes with scendants had the same allele frequencies,
neighboring farming communities occurred and a t each generation proceeded one step
in the same manner as described for the forward in the dispersal process. In this
OAG and OAC models (Eq. 5). The differ- way, the movement of a few individuals
ence is in the establishment of new farming across the sea was simulated. This had little
populations, which under ATC and GIM was importance for the allele frequencies of
controlled by a matrix of dates of origin of
agriculture at each land pixel of the map
(details on how archaeological information
was processed for this purpose are in Sokal Fig. 1. Spread of farmers under the OAG and OAC
models. The localities where farming is being practiced
et al. (1991)). Therefore, when colonization are indicated by letters representingallele frequency in
of a new locality by the first farmers had farmers, at generations 100 (6500 BC), 200 (4000 BC), and
to be simulated, eight effective founding 240 (3000 BC). Eight allele-frequencyclasses are defined,
individuals were sampled with replace- from a to h, each corresponding to an interval equal to
ment from the closest suitable locality. In a 0.125 (a, p F < 0.125; b, 0.125 <p,0.250; c, 0.250
< p F < 0.375; etc.). Hyphens represent areas inhabited
few cases, this required input of immigrants only by hunter-gatherers.The nuclear zone is delimited
from localities that were not directly adja- by a solid square. While the pattern of land occupation
cent to the one in which farming was start- shown was constant for all the realizations of the mod-
ing. This was the only violation that we els, the allele frequencies depicted are those of a single
tolerated of the assumptions of the step- run of OAC.
Fig. 2. Spread of farmers under the ATC and GIM
ping-stone model. Figure 2 is an example models. The localities where farming is being practiced,
of allele frequencies generated under the based on archaeological information, are indicated by
ATC and GIM models. It shows that the figures representing allele frequency of farmers, at gen-
spread of farmers is initially slower than erations 100 (6500 BC), 200 (4000 BC), and 240 (3000 BC).
Allele-frequencyclasses are as in Figure 1. The pattern
simulated under the OAG and OAC models, ofland occupationwas again the same in all realizations
especially along the northern shores of of ATC and GIM, but the allele frequency shown is that
the Mediterranean Sea. By genera- of a single realization ofATC.
1
t
118 G. BARBUJANI ET AL.
hunter-gatherers and for those of farmers Rendine et al. (1986), the value of y was
once the spread of farming through Europe adjusted so that resultant values of S had
was completed; however, it had an effect on approximately the same magnitude as in
the establishment of farming communities Equation 6. However, we found, over a
under the OAG and OAC models, as a few range of test conditions, that the results
individuals could quickly reach distant lo- were only trivially affected. Therefore, we
calities by sea. The resulting pattern of occu- report results for Expression 6 without redo-
pation of coastal regions corresponds well ing the analysis for every test condition.
with the archaeological evidence (see Figs. 1
and 2). Disappearance of hunting-gathering
Cultural contacts and admixture populations
Under the OAC, ATC, and GIM models, at In the IBD model, hunter-gatherers adopt
each generation a certain number of hunter- farming at generation 40, and thus all genes
gatherers adopted farming, if a farming of Indo-European speakers come from the
community already existed a t the locality genetic pool of the hunting-gathering com-
where they lived. The likelihood of this cul- munities. In the OAG model, conversely, the
tural shift depended on the probability of hunter-gatherers go extinct, and thus all
contacts between farmers and hunter-gath- genes of Indo-European speakers derive
erers, and on a coefficient of acculturation from the genes of the few first farmers of
which was called y by Rendine et al. (1986). Southern Anatolia. These are the two ex-
The number S of individuals shifting to treme models, as far as the origins of Indo-
farming at each generation is related to the Europeans are concerned. Under the other
probability of contacts between farmers and models, a certain degree of admixture is sim-
hunter-gatherers. If farmers are NF at a ulated between the two communities at each
given locality and a t a given moment in locality, reflecting a widespread view of hu-
time, their probability of meeting one of the man evolution in Europe (Cavalli-Sforza
NHG hunter-gatherers will represent a frac- and Piazza, 1993).
tion equal to 2(NF x NHG) of all the Under the OAC, ATC, and GIM models,
(NF+ NHG)' possible contacts. The proba- the hunters and gatherers are considered to
bility y that such a contact will result in disappear from a certain locality when their
acculturation has been estimated at 0.00024 number is such that S is less than 1. Be-
(Rendine et al., 1986). Therefore, at each cause acculturation starts after a phase of
generation, the NF farmers will transmit population buildup for farmers, the extinc-
their technologies t o a number S of the tion of the hunting-gathering communities
hunter-gatherers estimated as also proceeds as a wave, from southeast to
northwest, spreading in parallel with the
farming economy, but several generations
later.
Long-range migratory movements
where all parameters have already been de- In the GIM model, three major migratory
fined. waves of Kurgan people are supposed to
It can be argued that the change in a par-
have introduced Indo-European languages
ticular quadrat of the number of hunter- into Europe, around 4,250 BC, 3,400 BC, and
gatherers per generation might better be 2,900 BC, respectively (Gimbutas, 1979,
modelled as proportional to the product of 1986).
the number of hunter-gatherers and the In Gimbutas' (1979) view, the westward
number of farmers, that is migrations of Kurgan people in Europe in-
S = YNHGNF (7) troduced a new patriarchal culture, charac-
terized by horse-riding and new warfare
where y = 1.56250 x In this expres- techniques. These cultural changes were not
sion, closely resembling a formula used by associated with major innovations of the
INDO-EUROPEAN ORIGINS 119
subsistence techniques. Most of the popula- parameters being constant. Gene flow is re-
tions of Europe by then were farmers. It is duced at generation 265, i.e., 3,375 years
therefore highly unlikely that concomitant ago, by which time all suitable regions had
population growth could occur. We chose to been colonized by early agriculturalists. The
represent these waves as a flow of genes parameters of the simulation are summa-
from the purported source area, north of the rized in Table 2.
Black Sea and west of the Caspian Sea. For
the sake of simplicity, these movements
were concentrated in one generations time Gene-frequency data
(at generations 190,224, and 244), although The simulated sets of allele frequencies
each of them probably lasted two centuries were compared with a database of allele fre-
(Gimbutas, 1979). For each movement, we quencies, which had been analyzed in vari-
simulated replacement of 20% of the genes ous studies on Europe (Sokal et al., 1988,
in the target area, with genes coming from 1989a,b, 1990, 1991, 1992; Harding and
the source area. Probably this overempha- Sokal, 1988; Barbujani and Sokal, 1990;
sizes the genetic consequences of the simu- Sokal, 19911, and had been continuously up-
lated migratory movements. dated. The data corresponding to popula-
The invading Kurgan people were not the tions speaking languages other than Indo-
entire population of the area between the European (Basque, Finnish, Estonian, Lapp,
Black and Caspian sea moving en masse; Hungarian, Turkish: Ruhlen, 1987)were dis-
rather, they were groups of individuals be- carded.
longing to semi-nomadic tribes (Gimbutas, Twenty-six genetic systems were consid-
1979). Accordingly, we chose to simulate ered. Most of them corresponded to indepen-
their contribution to the genetic pool of thedent loci; exceptions are ABO, MN, and Rh,
invaded populations as if they were com- for which two (or three, for Rh) systems were
ing from several populations in the appro- independently considered, each resulting
priate zone. The location of four such popu- from typing of alleles by different sets of an-
lations was chosen a t random, and then held tisera. This convention has long been fol-
lowed in studies on human variation (e.g.,
constant in all 2,600 simulation cycles of the
GIM model. The allele frequencies of the re-see Lewontin, 1972). Each system is indi-
cipient populations (Fig. 3) were then recal-
cated by a letter code, preceded by a number
culated as if 20% of the pre-existing indi- referring to Mourants coding system (Mou-
viduals had been replaced by immigrant Kur- rant et al., 1976), except for 100HLA-A,
gan people. 101/2HLA-B, 200GM, and 201Kh4, whose
In addition, Gimbutas (personal commu- numerical codes were assigned in our lab-
nication) pointed out to us 12 other direc- oratory. Overall, 3,481 records, and 93 alle-
tional and potentially migratory processes les or haplotypes were considered.
that may have been important in determin- The number of samples available for the
ing the current linguistic population struc-26 systems varied widely, ranging from a
ture of Europe. These processes are summa- minimum of 27 (for 5-1 LUTHERAN), to a
rized in Figure 4, and were incorporated in maximum of 762 (for 1-1ABO).Genetic dif-
the GIM model. Once again, for each of them ferences between localities were summa-
we simulated replacement of 20% of the in- rized by 26 matrices of Prevostis distances
dividuals of the target area by individuals
(Prevosti et al., 19751, separately calculated
whose allele frequency was the average al- for each system. We shall refer to these ma-
lele frequency in the sourcearea. trices as observed distance matrices, as op-
posed to the simulated ones, generated by
Decrease of population mobility after the computer under one of the five models
establishment of a farming economy tested. To properly compare the two sets of
Once farming populations have reached data, prior to calculating genetic distances
the maximum size allowed by the programs, we pooled the observed frequencies of all al-
a reduction of mobility is simulated by sim- leles except the one whose average fre-
ply halving the migration rate m, all other quency was closest to 0.5.
120 G. BARBUJANI ET AL.
Fig. 3. Migratory movements simulated under the Gimbutas are surrounded by solid lines; the zone where
GIM model. The four localities whose allele frequencies Basque is currently spoken is not supposed to have been
are averaged, to represent the allele frequencies of the affected by these migratory episodes. Figures refer to
migrating population, are marked by asterisks. The re- the generation at which each migratory wave was simu-
gions affected by the three migratory waves proposed by lated.
Fig. 4. Twelve potentially important migratory processes considered by Gimbutas (personal commu-
nication) to have been relevant in European ethnohistory, each one represented by an arrow from a
sourceregion to a targetregion. Figures refer to the generation at which each population movement
was simulated.
could use paired statistical tests to compare chosen for consistency with previous studies
models. This provided a considerable in- (Sokal, 1988; Sokal et al., 1993).
crease in statistical power over unpaired Simulated and observed matrices were
comparisons. then compared pairwise by means of Man-
After 440 generations in each computer tels test of matrix association (Mantel,
cycle, simulated allele frequencies were 1967; Smouse et al., 1986). This test com-
sampled from localities chosen so as to putes the equivalent of a correlation coeffi-
match the locations of the samples of the cient between matrices, and evaluates its
observed allele-frequency database. Matri- significance by constructing a null distribu-
ces of Prevostis distances were computed tion of the test statistic. A Monte Carlo pro-
from the simulated data, so as to obtain 100 cedure is employed for this purpose; rows
simulated matrices for each matrix of ob- and columns of one matrix are repeatedly
served genetic distances (Prevosti et al., permuted at random, while the other matrix
1975). This measure of genetic distance was is kept constant, and the test statistic is re-
122 G. BARBUJANI ET AL.
calculated each time, so a s to yield the de- relation coefficients, and only positive dif-
sired null distribution. ferences (indicating a n improvement of the
Because each observed matrix was com- fit for the more complex hypothesis) were
pared with 100 simulated matrices for each considered significant; 2) Wilcoxons signed
model tested, a procedure was needed to rank test (Sokal and Rohlf, 19951, a nonpa-
combine all this information. We chose to rametric paired-comparisons test, once
compute average Mantel correlation coeffi- again considering significant only the cases
cients, and to calculate Fishers combined in which the fit improved for the more com-
probabilities (Sokal and Rohlf, 1995) from plex hypothesis.
the 100 individual probabilities for each
model. RESULTS
Since we are looking for positive associa- The average Mantel correlations for each
tion of observed and simulated data, and a genetic system (Table 3) yield numerous sig-
negative correlation would have no biologi- nificant agreements between observed and
cal meaning, all tests of significance were simulated matrices of genetic distances, for
one-tailed. However, a s a further control, we all models, although fewest with IBD. The
also counted the number of occurrences of number of significant ( P < 0.05) positive av-
negative correlations that would be signifi- erage correlations is maximal for the ATC
cant if the test had been two-tailed. This model (20/26), but it is not substantially
allowed us to identify the models generating lower for OAG, OAC, and GIM (respectively,
allele-frequency distributions departing 18, 17, and 16 systems). For IBD it is only
widely from the observed distributions. 10/26. The numerical values of the correla-
The main purpose of this study was to tions are low despite their high level of sta-
compare competing hypotheses on the origin tistical significance. This is characteristic
of Indo-Europeans. Because the hypotheses for genetic distances and is because the
can be ranked, by increasing complexity, Mantel correlations were constrained to be
from IBD through OAG, OAC, ATC, and linear.
GIM, four painvise tests of goodness of fit Next we examine the number of cases in
were carried out, OAG versus IBD, OAC ver- which individual simulation realizations
sus OAG, ATC versus OAC, and GIM versus gave genetic distance matrices that would
ATC. This was done taking advantage of the appear negatively associated (P s 0.05)
paired design based on the same random with the observed one, if the test had been
seeds in the simulations. We employed two two-tailed. A substantial number of dis-
different procedures: 1) a paired-compari- agreements, between observed and simu-
sons t-test, where the test statistic was the lated data is evident for IBD (6 at system
difference between the average Mantel cor- 4-13 RHESUS), for ATC (8 in the 1-1 ABO
INDO-EUROPEAN ORIGINS 123
TABLE 3. A) Average Mantel correlations of observed with simulated genetic distances and B) significance levels based
on ont-tailed probabilities for positive correlation combined by Fisher's method'
System IBD OAG OAC ~
An: GIM
A. Mantel correlations
1-1-mo 0.01294 0.14800 0.15233 0.13137 0.01887
1-2x30 0.05699 0.10831 0.09718 0.00411 0.03532
2-5-MN 0.00324 -0.05627 -0.05352 -0.04100 -0.03231
2-7-MN -0.02886 -0.02236 -0.02091 -0.03127 -0.03912
3-1-P -0.03558 -0.04556 -0.03411 0.02409 0.01112
4-1RHESU 0.01311 -0.01457 -0.01428 0.00674 0.03130
4-13RHES -0.02547 0.06920 0.06120 0.03431 0.02375
4-19RHES -0.04258 -0.03158 -0.02899 -0.01635 -0.08314
5-1LUTH -0.00590 -0.05392 -0.04768 0.00777 0.02033
6-1-KELL 0.01789 0.07960 0.08378 0.03392 0.03330
6-3-KELL -0.04254 0.06145 0.01510 -0.01550 -0.06665
7-1ABHSE 0.01781 0.01064 0.02805 0.05623 0.05420
8-1DUFFY 0.00290 0.04017 0.05362 0.01197 -0.01077
36-1-HP 0,11991 0.15415 0.17941 0.07459 0.08002
37-1-TF 0.00592 0.08658 0.09142 0.08117 0.00461
38-1-GC -0.01547 -0.05739 -0.05843 -0.01637 0.05178
50-1-1AP -0.00965 0.12963 0.11903 0.10520 0.06548
52-PGD 0.04775 -0.03417 -0.03457 -0.04371 0.01603
53-PGM1 -0.00845 0.13174 0.14000 0.10789 0.06152
56-AK -0.00696 0.06676 0.10858 0.05081 -0.01301
63-ADA 0.00667 0.27423 0.26493 0.07537 0.03099
65-TASTE 0.02802 0.17960 0.16922 0.07373 0.04040
100HLA-A 0.01417 0.13654 0.15443 .0.08977 0.01311
101-102 -0.00854 0.24013 0.22784 0.12228 0.09136
200-GM 0.01455 0.34928 0.33330 0.14102 0.03480
201-KM -0.02816 0.07813 0.09279 0.05959 -0.03732
B. Significance levels
1-1-ABO 0.00000 0.00000 0.00000 0.00000 0.00000
1-2-ABO 0.00000 0.00000 0.00000 0.00489 0.00000
2-5-MN 0.00339 1.00000 1.00000 1.00000 1.00000
2-7-MN 1.00000 1.ooooo 1.00000 1.00000 1.00000
3-14' 1.00000 1.Ooooo 1.00000 0.o0000 0.00003
4-1RHESU 0.00000 1.00000 1.00000 0.00198 0.00000
4-13RHES 0.99975 0.00000 0.00000 0.00000 0.00000
4-19RHES 0.99996 0.99980 0.99913 0.00001 1.00000
5-1-LUTH 0.52240 1.00000 1.00000 0.22014 0.04162
6-1-KELL 0.47350 0.0oooo o.ooooo 0.00001 0.00385
6-3-KELL 1.00000 0.00000 0.09492 0.94677 1.00000
7-1ABHSE 0.00942 0.03488 0.00002 0.00000 0.00000
8-1DWFY 0.94188 0.00000 0.00000 0.01885 0.99931
36-1-HP 0.00000 0.00000 0.00000 0.00000 0.00000
37-1-TF 0.82795 0.00000 0.00000 0.00000 0.79836
38-1-GC 0.98856 1.00000 1.00000 0.51374 0.00000
50-1-1AP 0.94062 0.00000 0.00000 0.00000 0.00000
52-PGD 0.00554 1.00000 1.00000 0.99974 0.68047
53-PGM1 0.99819 0.00000 0.00000 0.00000 0.00000
56-AK 0.97077 0.00000 0.00000 0.00000 0.99442
63-ADA 0.05673 0.00000 0.00000 0.00000 0.00132
65-TASTE 0.00000 0.00000 0.00000 0.00000 0.00000
100HLA-A 0.00107 0.00000 0.00000 0.00000 0.06326
101-102 0.97945 0.0oooo o.ooooo 0.0oooo 0.00000
200-GM 0.04825 0.00000 0.00000 0.00000 0.00115
201-KM 0.99886 0.00000 0.00000 0.00000 0.99918
Overall probability 0.00000 0.00000 0.00000 0.00000 0.00000
'Values below 0.05 show significant positive correlation between simulated and observed genetic distances
and 11 in the 2-5 MN systems), and for GIM significant increase in the resemblance of
(26 in the 1-1 ABO, and 8 in the 4-19 observed and simulated genetic distances
RHESUS systems). occurs between IBD and OAG. In the t-tests
The painvise comparisons by t-tests and shown in Table 4 , 1 7 systems show a signifi-
by Wilcoxon's signed ranks test agree in cant increase of correlation,whereas in only
their means and significances that the most 7 cases does the similarity decrease. These
124 G. BARBUJANI ET AL
TABLE 4. Results of paired comparisons t-tests for differences between 5 Indo-European simulation hypotheses:
P ( H I ) - P (H2)
~ ~
figures compare with 7 systems showing sig- although ATC shows a significant positive
nificantly improved correspondence and 10 correlation for the largest number of sys-
showing decreased correspondence for OAC tems, on the average, correlations between
versus OAG, 6 and 19, and 5 and 19, respec- observed and simulated genetic distance are
tively, for ATC versus OAC, and GIM versus higher for OAC and OAG.
ATC. These results are reflected in the To test the plausibility of our simulations
mean differences shown at the bottom of Ta- we also calculated FsTvalues (Wright,
ble 4. 1978) for both our observed gene-frequency
Using Wilcoxons criterion, the results do surfaces and the simulated surfaces. The
not change much, so we do not feature them median results over all genetic systems are
as a separate table. The numbers of systems 0.011780 for the observed surfaces, and
showing significance increased and any de- 0.098273, 0.10399, 0.096562, 0.056609, and
creased resemblance between observed and 0.003211, respectively, for the simulated
simulated data are, respectively, 17 and 7 surfaces of models IBD, OAG, OAC, ATC,
for OAG versus IBD; 10 and 9 for OAC ver- and GIM. Although the FsTvalues of the
sus OAG; 5 and 19 for ATC versus OAC; and observed surfaces overlapped only slightly
5 and 18 for GIM versus ATC. those of the simulated surfaces, the median
We conclude, as a result of all the tests, of the observed data falls within the bound-
that similarity between observed and simu- aries of the medians described by the mod-
lated genetic distances increases from IBD els. The latter fall into 3 groups by magni-
to OAG, and, to a lesser extent, from OAG to tude of FsT.These are 1)IBD, OAG, OAC; 2)
OAC. On the contrary, it decreases as mod- ATC; and 3) GIM. Thus, the clear superior-
els are tested in which the spread of farmers ity of OAG and OAC over IBD cannot be
is constrained by archaeological time data shown by FST since various patterns of local-
(ATC), or demographic processes occurring ity differentiation can yield the same F,,
in the late Neolithic are added (GIM). Thus value.
INDO-EUROPEAN ORIGINS 125
DISCUSSION Similarly, the pairwise comparison of
Which model matches observed models show a substantial increase of fit of
data best? the OAG over the IBD model (Table 41,
whereas the elements included in the ATC
The IBD model assumes that, in the Neo- and GIM simulations cause a slight but evi-
lithic, groups of hunter-gatherers and farm- dent departure from the patterns observed,
ers were not separated. The former gradu- making them poorer fits than OAG.
ally turned to farming, so that there was a A first conclusion one may draw from the
genetic continuity between pre- and post- results of this simulation study is that two
Neolithic populations in Europe, and Indo- models account best for many aspects of the
European languages spread only by cultural contemporary genetic structure of Indo-
transmission. Allele frequency patterns gen- European-speaking populations of Europe.
erated under this model resemble poorly the One is the demic diffusion model, as origi-
patterns of genetic variation observed in nally put forward by Menozzi et al. (1978),
contemporary populations, showing that and associated with linguistic evidence by
this evolutionary hypothesis does not fit Renfrew (1987). Under this model, here
with the available genetic evidence. called OAC, the two forces driving microevo-
Resemblance between observed and simu- lution in Europe were population growth de-
lated patterns is much greater for the other termined by farming, and dispersal accom-
four models, in which farmers evolve sepa- panied by limited population admixture
rately from hunter-gatherers, and processes between early agriculturalists (possibly pro-
of population expansion are important. The to-Indo-European speakers) and preexisting
levels of resemblance, however, do not differ hunters and gatherers. The other model,
much among these four models. For in- OAG, is a simplified version of the demic
stance, the GIM model, including several diffusion model, in which dispersal of farm-
population processes occurring in the last ers does not lead to any degree of admixture
5,000 years, does not give higher correla- with hunter-gatherers.
tions, or significant correlations at a higher
number of loci, than the OAG model, where How plausible is the OAG model?
the demographic changes prompted by the While the OAC model has already re-
origin of agriculture are simulated in a ceived support from studies focussing on its
much rougher manner. genetic (Sokal et al., 1991; Cavalli-Sforza et
Actually, various results of this simula- al., 19931, as well as linguistic and archaeo-
tion study indicate that the fit of simple logical, aspects (reviewed in Renfrew, 1992),
models, such OAG and OAC, is better than what we called OAG here has not been ana-
that of more complex models. The Mantel lyzed in detail so far. An apparent problem
correlations between observed and simu- with it is, how can a model not involving
lated genetic distances would be negative admixture account for the continent-wide
and significant in only 21 of the 2,600 cases clines observed in Europe?
for OAG, and in 14 cases for OAG, had tests Inspection of gene-frequency maps gener-
been two-tailed. These figures compare with ated in this study, at various moments in
89 and 105 negative significant correlations time, shows that founder effects are common
for ATC and GIM, respectively. It seems, while farmers disperse. Founder effects are
therefore, that nothing is added to our un- due to the limited numbers of individuals
derstanding of the phenomena, if we add ar- who start the farming communities in new
chaeological time data to constrain the localities. In the OAG model as we simu-
spread of Neolithic farmers, and even less so lated it, most farming communities start
if we simulate population movements in the with 8 effective individuals; but even if this
late Neolithic. The models where farmers number were larger, the probability for the
disperse into new areas simply because of allele in question to be lost or fixed would be
their numbers, which increase logistically, substantial. Loss of genetic variation
yield patterns showing a better agreement through repeated founder effects has been
with the observed data. invoked as the likely cause of clines in sev-
126 G. BARBUJANI ET AL.
era1 studies on natural populations of toads migrations of Kurgan people. This study
in Australia (Easteal, 1988) and aquatic in- cannot establish whether or not these mi-
vertebrates in Canada (Boileau et al., 1992). gration events really occurred, but, if they
Theoretical work on the genetic effects of occurred, they did not leave a significant
colonization of previously unoccupied locali- mark on the allele frequencies of current
ties (Wade and McCauley, 1988) agrees with populations.
this view. Renfrew (1987) argued that the cultural
An additional factor, increasing the likeli- transformations that led Gimbutas to hy-
hood of clines even in the absence of admix- pothesize late-Neolithic migration waves
ture between farmers and hunter-gatherers, could be due to cultural contacts instead,
is the Black Sea. Archaeological evidence and equated the first Indo-Europeans with
(e.g., see Renfrew, 1991) indicates that two the first farmers. The extensive changes in
waves of early farmers dispersed westwards ceramics, architecture, and metallurgy oc-
and northwards from the Near East, with curring in the late Neolithic are then attrib-
the Black Sea separating them (this is why uted to trading and imitation; long-distance
we did not allow movement of individuals migratory movements, if any, may have
through it, but only along its coasts). The been marginal. Although not proved by our
two waves later converged in eastern Eu- simulation, this view is fully compatible
rope, after a period of independent evolu- with it.
tion. If the same allele had been lost, or This study, therefore, agrees with the
fixed, in both groups of farmers, no particu- main views expressed by Menozzi et al.
lar pattern would result; but if founder ef- (19781, Rendine et al. (19861, Piazza (19931,
fects had had opposite consequences in the and Cavalli-Sforza et al. (1993). By contrast,
two groups, the successive admixture would the emphasis laid by the same authors on
initially determine a steep cline, and succes- late Neolithic migrations from the Pontic
sive gene flow would smooth it, resulting in steppes (Cavalli-Sforza et al., 1993) does not
a wide gradient (Endler, 1977). find support in our simulations. Among the
Even under OAG, therefore, a certain role possible causes of this discrepancy, it may be
of admixture is important. But admixture, that Mantel's correlations are not sensitive
under OAG, is between different groups of enough to recognize the effects of minor pro-
farmers, who were geographically separated cesses of gene flow, such as those presum-
in part of their evolutionary history, rather ably occurring in the late Neolithic. Alterna-
than between farmers and hunter-gatherers tively, however, or in addition, one should
of the same area. This interpretation em- consider the possibility that principal com-
phasizes the role both of geographical fac- ponents associated with low eigenvalues re-
tors, such as distance between regions, and flect, at least in part, artificial gradients due
of cultural barriers between sympatric com- to data interpolation. This may be the case
munities of farmers and hunters-gatherers. for areas where population samples are
Indeed, physical barriers are often associ- sparse, such as most of eastern Europe. For
ated with genetic and linguistic change, example, the Caucasus seems to show clinal
even between Indo-European speakers (Bar- variation in the first and third principal
bujani and Sokal, 1990, 1991), although components of Cavalli-Sforza et al. (1993),
other evolutionary mechanisms may also ac- but a detailed genetic study shows that
count for that association (Barbujani, 1991). clines are very uncommon there (Barbujani
et al., 1994).
Genetics and Kurgan waves Our evaluation of the Gimbutas model
Introducing the three migratory waves should be revised if evidence could be pro-
postulated by Gimbutas (GIM model) into vided that the spread of the Kurgan people
the simulation, not only does not increase was accompanied by an increase in popula-
the correlations, but somewhat reduces tion sizes larger than that simulated by us.
them. This means that the current patterns A certain level of ambiguity exists about
of allele frequencies among Indo-Europeans this, as the movement of people from the
can be explained without resorting to the Pontic steppes that Gimbutas (1979) hy-
INDO-EUROPEAN ORIGINS 127
pothesized is called a population expan- rather than limited admixture or founder
sion by Cavalli-Sforza et al. (1993). These effects, leads to clinal variation of gene fre-
authors seem to suggest that, because of the quencies (e.g., see Hedrick, 1986). It is in-
warfare technologies associated with it, triguing to note that the hypothesis of demic
larger populations could be supported in the diffusion from the Near East was initially
regions affected. This aspect remains to be developed to account for clines a t the histo-
explored, and we do not have evidence for or compatibility loci, HLA-A and HLA-B
against this view. However, even if this had (Menozzi et al., 1978; Sokal and Menozzi,
been the case, the increases in population 19821, and that the most significant evi-
sizes prompted by the beginning of food pro- dence for clines spanning Eurasia has been
duction seem to have been much larger than found at the glyoxalase locus (Barbujani,
those associated with new war technologies 19871, which is linked with HLA on chromo-
(see Ammerman and Cavalli-Sforza, 1984). some 6, in a region of extensive linkage dise-
Unless European populations increased dra- quilibrium (Hedrick et al., 1986).
matically in size between 6,000 and 5,000 However, this view, although compatible
years ago, as they did with the arrival of the with the gradients existing a t the HLA and
new farming technologies, we conclude that linked loci, can hardly account for the pat-
the long-distance migrations postulated by terns of variation observed among Indo-Eu-
Gimbutas remain an unnecessary element ropean speakers at other, independently in-
in the evolution of Indo-European-speaking herited, loci. Had the resistance to parasites
populations, as reconstructed from the com- been the main cause of clines in Europe, one
parison of theoretical models and gene-fre- would expect isolation by distance patterns
quency data. at most loci not involved in tissue recogni-
Besides, early farmers expanded into ar- tion, which is not the case (Sokal et al.,
eas of low population density, where few im- 1989a; and this study). On the contrary, the
migrants could substantially modify the ge- nearly parallel gradients observed for many
netic build up of local populations; but this independent alleles suggest that an evolu-
was not the case for late Neolithic groups, tionary pressure affecting the entire genome
who invaded regions already occupied by and not merely part of it, i.e., gene flow,
large farming communities. Simulations of played a major evolutionary role (Slatkin,
genetic processes based on the coalescent 1985,1987).
approach (see Hudson, 1990) show that pat-
terns of genetic variation do not tend to Relation to other work
change much after a demographic expansion Diakonov (1984; cited in Redrew, 1987)
(Harpending, 1994; Rogers and Jorde, listed what he called the essential questions
1995). Successive population movements concerning the origins of Indo-European
can smooth out the gradients and blur some speakers: Who migrated? Why? How many
patterns, but are unlikely to leave a signifi- of them were there? Was it actually a migra-
cant mark on allele frequencies. tion of people, or rather the transfer of a
language from one population to another?
Are parasites responsible for clines? The present study may contribute to an-
Recent evolutionary models (reviewed in swering some of these questions. Our results
Ladle, 1992) indicate that new genotypes show that migrations in the late Neolithic,
entering an area could be resistant to the which have been inferred from changes in
parasites that are already adapted to the the material culture of eastern and central
common resident genotypes. The new geno- Europe (Gimbutas, 19791, are not reflected
types would then increase in frequency, un- in the current genetic structure of Indo-Eu-
til the parasites adapt to them. A selective ropean-speaking populations. Conversely,
mechanism of this type, combined with gene the correlations of observed and simulated
flow, might have been important in deter- data are positive and significant only if we
mining the European clines of allele fre- simulate dispersal of farmers from the Le-
quencies; models may be envisaged whereby vant by demic diffusion. The results of this
a form of frequency-dependent selection, study are, therefore, compatible with the
128 G. BARBUJANI ET AL.
cussions with Professors Luca Cavalli- Diakonov IM (1984)On the original home of the speak-
ers~of Indo-European. 23:
Sforza, H~~~ ~ ~ ~ ~R ~~ ~
5-87.
d ~ i ~ Soviet~ ~Anthropol.
~ , Archaeol.
,
Italo Scardovi, and Michael Turelli. Easteal S (1988)Range expansion and its genetic conse-
quences in populations of the giant toad, Bufo rnari-
nus. Evol. Biol. 23:4%84.
LITERATURE CITED
Eisen MM (1979)Mathematical Models in Cell Biology
Ammerman AJ, and Cavdli-Sforza LL (1971)Measur- and cancer Chemotherapy,~ ~ springer,
~ l i ~ :
ing the rate Of spread Of in Europe' Man Endler JA (1977)Geographic Variation, Speciation, and
6:674-688. Clines. Princeton, New Jersey: Princeton University
Ammerman AJ, and Cavalli-Sforza LL (1984)The Neo- press.
lithic Transition and the Genetics of Populations in Feller (1940)On the logistic law of growth and its
Europe. Princeton, New Jersey: Princeton University empirical verifications in biology, Ada Biotheor, 5:
Press. 51-66.
Barbujani G (1987)Diversity of Some gene frequencies Gimbutas M (1979)The three of KurKan people
in European and Asian populations. 111. Spatial corre- into Old Europe, 4500-2500 B,C, Arch, Suisses An-
logram analysis. Ann. Hum. Genet. 51:345-353. thropol. Gen. 43~113-137.
BarbuJani (1991)What do languages us about
Gimbutas M (1986)Remarks on the ethnogenesis of the
human microevolution? Trends Ecol. Evol. 6:151-156. Indo-Europeans in Europe. In Bernhard and A
Barbujani G, Jacquez GM, and Ligi L (1990)Diversity of Kandler-P&son (eds.): Ethnogenese Europaischer
Somegene frequencies in European and Volker, Stuttgart: Gustav Fischer Verlag, pp. 5-20.
tions. V. Steep multilocus clines. Am. J. Hum. Genet. Guglielmino CR, Piazza A, Menozzi p, and Cavalli-
47:a67-875. Sforza LL (1990)Uralic genes in Europe. Am. J . Phys.
Barbujani G, Nasidze IS, and Whitehead GN (1994)Ge- 83:57-68.
netic diversity in the Caucasus. Hum. Biol. 66:639-
Harding RM, and Sokal RR (1988)Classification of the
668.
European language families by genetic distance. Proc.
Barbujani G, and Pilastro A (1993)Genetic evidence on Natl, Acad. sci, u, s,A. 85,.937c9372,
origin and dispersal of human populations speaking
languages of the Nostratic macrofamily, pro,., Natl, Harpending HC (1994)Signature Of ancient population
Acad. Sci. U. S. A. 90:4670-4673. growth in a low resolution mitochondria1 DNA mis-
match distribution. Hum. Biol. 66:591-600.
Barbujani G, and Sokal RR (1990)Zones of sharp ge-
netic change in Europe are also language boundaries, HaMan FA (1973)On the mechanisms Of population
Proc. Natl. Acad. Sci. U. S. A. 87:1816-1819. growth during the Neolithic. Curr. Anthropol. 14:
535-543.
Barbujani G, and Sokal RR (1991)Genetic population
structure of Italy. XI. Physical and cultural barriers to FA (lg81) New
gene flow. Am. J . Hum. Genet. 48:398-411. York: Academic Press.
Barker G (1985)Prehistoric Farming in Europe. Cam- Hedrick pw (1986)Genetic P b o V h i s m in heteroge-
bridge: Cambridge University Press. neous environments: A decade later. Annu. Rev. Ecol.
Barker G (1988)Comment on "Archaeology and Lan- 'yst. 17t535-566.
guage," by C. Renfrew. Cum. h t h r o p o l . 29:44H49. Hedrick PW, Thomson G, and Klitz W (1986)Evolution-
Bertranpetit J, and Cavalli-Sforza LL (1991)A genetic ary genetics: HJA as an system. In s Kar-
reconstruction of the history of the population of the lin and E Nevo (eds.): Evolutionary Processes and
Iberian peninsula. Ann. Hum. Genet. 5551-67. Theory. Orlando, Florida: Academic Press, pp. 503-
Boileau MG, Hebert PDN, and Schwartz SS (1992)Non- '06.
equilibrium gene frequency divergence: Persistent Hudson RR (1990)Gene genealogies and the coalescent
founder effects in natural populations. J. Evol. Biol. Process. In FutuWa and J AntonovicS (eds.): OX-
5:25-39. ford Surveys in Evolutionary Biology, Vol. 7.Oxford,
Cavalli-Sforza LL (1988)The Basque population and U K Oxford University Press, pp.
ancient migrations in Europe. Munibe 6:12%137. Jorde LB (1980)The genetic structure of subdivided hu-
Cavalli-Sforza LL, Menozzi p , and Piazza A (1993) man Populations. A review. In JH Mieke and MH
~~~i~ and human evolution, science 259: Crawford (eds.): Current Developments in Anthropo-
639446. logical Genetics, Theory and Methods. New York: Ple-
Cavalli-Sforza LL, Minch E, and Mountain J L (1992) numt pp' 135-208'
Coevolution of genes and languages revisited, P ~ c . Kaiser M, and Shevoroshkin V (1988)Nostratic. Annu.
Natl. Acad. Sci. U. S. A. 89:5620-5624. Rev, Anthropol. 17:30%329.
Cavalli-Sforza LL, and Piazza A (1993)Human genomic Keyfitz N (1977)Introduction to the Mathematics of
diversity in Europe: A summaw of recent research Populations with Revisions. Reading, UK: Addison-
and prospects for the future. Eur. J . Hum. Genet. Wesley.
l:3-18. Kimura M, and Weiss GH (1964)The stepping stone
Cavalli-Sforza LL, Piazza A, Menozzi P, and Mountain J model of Population structure and the decrease of ge-
(1988)Reconstruction of human evolution: Bringing netic correlation with distance. Genetics 49561-576.
together genetic, archaeological and linguistic data. Ladle RJ (1992)Parasites and sex: Catching the red
Proc. Natl. Acad. Sci. U. S. A. 85:6002-6006. queen. Trends Ecol. Evol. 7:405408.
INDO-EUROPEAN ORIGINS 131
Lewontin RC (1972)The apportionment of human diver- Sokal RR (1988) Genetic, geographic, and linguistic dis-
sity. Evol. Biol. 6:381-398. tances in Europe. Proc. Natl. Acad. Sci. U. S. A. 85:
Manly BFJ (1991) Randomization and Monte Carlo 1722-1726.
Methods in Biology. London: Chapman and Hall. Sokal RR (1991) Ancient movement patterns determine
Mantel N (1967) The detection of disease clustering and modern genetic variances in Europe. Hum. Biol. 63:
a generalized regression approach. Cancer Res. 27: 589-606.
209-220. Sokal RR, Harding RM, and Oden NL (1989a) Spatial
Menozzi P, Piazza A, and Cavalli-Sfona LL (1978) Syn- patterns of human gene frequencies in Europe. Am. J .
thetic maps of human gene frequencies in Europeans. Phys. Anthropol. 80:267-294.
Science 201:786-792. Sokal RR, Jacquez GM, Oden NL, DiGiovanni D, Fal-
Mourant AE, Kopec AC, and Domaniewska-Sobczak K setti AB, McGee E, and Thomson BA (1993) Genetic
(1976) The Distribution of the Human Blood Groups relationships of European populations reflect their
and Other Polymorphisms. Oxford Oxford University ethnohistorical affinities. Am. J. Phys. Anthropol. 91:
Press. 55-70.
Nei M (1987) Molecular Evolutionary Genetics. New Sokal RR, and Menozzi P (1982) Spatial autocorrelation
York Columbia University Press. of HLA frequencies in Europe support demic diffusion
of early farmers. Am. Nat. 119:l-17.
Piazza A (1993) Who are the Europeans? Science 260:
1767-1769. Sokal RR, Oden NL, Legendre P, Fortin M-J, Kim J,
Thomson BA, Vaudor A, Harding RM, and Barbujani
Piazza A, Cappello N, Olivetti E, and Rendine S (1988) G (1990) Genetics and language in European popula-
The Basques in Europe: A genetic analysis. Munibe tions. Am. Nat. 135:157-175.
6:168-176.
Sokal RR, Oden NL, Legendre P, Fortin M-J, Kim J , and
Press WH, Flannery BF, Teukolsky SA, and Vetterling Vaudor A (1989b) Genetic differences among lan-
WT (1986) Numerical Recipes. Cambridge: Cam- guage families in Europe. Am. J . Phys. Anthropol.
bridge University Press. 79:489-502.
Prevosti A, Ocana J , and Alonso G (1975) Distances Sokal RR, Oden NL, and Thomson BA (1988) Genetic
between populations of Drosophila suboscura based changes across language boundaries in Europe. Am.
on chromosome arrangement frequencies. Theor. J. Phys. Anthropol. 76:337-361.
Appl. Genet. 45:231-241. Sokal RR, Oden NL,and Thomson BA (1992) Origins of
Rendine S, Piazza A, and Cavalli-Sforza LL (1986) Sim- the Indo-Europeans: Genetic evidence. Proc. Natl.
ulation and separation by principal components of Acad. Sci. U. S. A. 89:7669-7673.
multiple demic expansions in Europe. Am. Nat. 128: Sokal RR, Oden NL, and Wilson C (1991) Genetic evi-
681-706. dence for the spread of agriculture in Europe by demic
Renfrew C (1987) Archaeology and Language. London: diffusion. Nature 351:143-145.
Jonathan Cape. Sokal RR, and Rohlf FJ (1995) Biometry, 3rd ed. New
Renfrew C (1989) Models of change in language and
archaeology. Trans. Philol. SOC.87t103-155. Bhatia K, and Wilson AC (1990)
Renfrew C (1991) Before Babel: Speculations on the ori- Geographic variation in human mitochondrial DNA
gins of linguistic diversity. Cambridge Arch. J . 1 : s from Papua New Guinea. Genetics 124:717-733.
23. Torroni A, Schurr TG, Yang C-C, Szathmary EJE,
Renfrew C (1992) Archaeology, genetics and linguistic Williams RC, Schanfield MS, Troup GA, Knowler WC,
diversity. Man N.S. 27:445478. Lawrence DN, Weiss KM, and Wallace DC (1992) Na-
Rogers AR, and Jorde LB (1987) The effect of non-ran- tive American mitochondrial DNA analysis indicates
dom migration on genetic differences between popula- that the American and NaDene populations were
tions. Ann. Hum. Genet. 51:169-176. founded by two independent migrations. Genetics
Rogers AR, and Jorde LB (1995) Genetic evidence on 130r153-162.
modern human origins. Hum. Biol., in press. Wade MJ, and McCauley DE (1988) Extinction and re-
Ruhlen M (1987)A Guide to the Worlds Languages. Vol. colonization: Their effect on the genetic differentia-
1: Classification. London: Edward Arnold. tion of populations. Evolution 42:995-1005.
Sgaramella-Zonta L, and Cavalli-Sforza LL (1973) A Wallace DC, and Torroni A (1992) American Indian pre-
method for the detection of a demic cline. In NE Mor- history as written in the mitochondrial DNA A re-
ton (ed.): Genetic Structure of Populations. Honolulu: view. Hum. Biol. 64t403-416.
University of Hawaii Press, pp. 12%135. Ward RH, Frazier BL, Dew-Jager K, and Paabo S (1991)
Extensive mitochondrial diversity within a single Am-
Sherratt A (1988) Comment on -Archaeology and Lan-
erindian tribe. Proc. Natl. Acad. Sci. U. S. A. 88:872&
guage, by C. Renfrew. Curr. Anthropol. 29:45-63.
8724.
Slatkin M (1985) Gene flow in natural populations.
Ward RH, Redd A, Valencia D, Frazier BL, and Paabo S
Annu. Rev. Ecol. Syst. 16:393430. (1993) Genetic and linguistic differentiation in the
Slatkin M (1987) Gene flow and the geographic struc- Americas. Proc. Natl. Acad. Sci. U. S. A. 90: 1066%
ture of natural populations. Science 236:787-792. 10667.
Smouse PE, Long JC, and Sokal RR (1986) Multiple Wijsman EM, and Cavalli-Sforza LL (1984) Migration
regression and correlation extensions of the Mantel and genetic population structure with special refer-
test of matrix correspondence. Syst. Zool. 35:627432. ence to humans. Annu. Rev. Ecol. Syst. 15:279-301.
132 G. BARBUJANI ET AL.
Wright S (1969) Evolution and the Genetics of Popula- world since the Neolithicum as indicated by its geno-
tions. Vol. 2. The Theory of Gene Frequencies. Chi- type for hybrid necrosis. J. dAgric. Trad. Bota. Appl.
cago: University of Chicago Press. 27:25-53.
Wright S (1978) Evolution and the Genetics of Popula- Zvelebil M, and Zvelebil KV (1988) Agricultural transi-
tions. Vol. 4. Variability within and among Natural tion and Indo-European dispersal. Antiquity 62574-
Populations. Chicago: University of Chicago Press. 583.
Zeven AC (1980) The spread of bread wheat over the old