Sie sind auf Seite 1von 24

AMERICAN JOURNAL.

OF PHYSICAL ANTHROPOLOGY 96109-132 (1995)

Indo-European Origins: A Computer-Simulation Test of


Five Hypotheses
GUIDO BARBUJANI, ROBERT R. S O W , AND NEAL L. ODEN
Dipartimento di Scienze Statistiche, Universita di Bologna, I-40126
Bologna, Italy (G.B.); Department of Ecology and Evolution, State
University of New York, Stony Brook, New York 11 794-5245 (R.R.S.,
G.B.); The EMMES Corporation, Potomac, Maryland 20854 (N.L.O.)

KEY WORDS Genetic variation, Demic diffusion, Language,


Computer simulation

ABSTRACT Allele frequency distributions were generated by computer


simulation of five models of microevolution in European populations. Genetic
distances calculated from these distributions were compared with observed
genetic distances among Indo-European speakers. The simulated models dif-
fer in complexity, but all incorporate random genetic drift and short-range
gene flow (isolation by distance). The best correlations between observed and
simulated data were obtained for two models where dispersal of Neolithic
farmers from the Near East depends only on population growth. More com-
plex models, where the timing of the farmers expansion is constrained by
archaeological time data, fail to account for a larger fraction of the observed
genetic variation; this is also the case for a model including late Neolithic
migrations from the Pontic steppes. The genetic structure of current popula-
tions speaking Indo-European languages seems therefore to largely reflect a
Neolithic expansion. This is consistent with the hypothesis of a parallel
spread of farming technologies and a proto-Indo-European language in the
Neolithic. Allele-frequencygradients among Indo-European speakers may be
due either to incomplete admixture between dispersing farmers, who presum-
ably spoke proto-Indo-European, and pre-existing hunters and gatherers (as
in the traditional demic diffusion hypothesis), or to founder effects during the
farmers dispersal. By contrast, successive migrational waves from the East, if
any, do not seem to have had genetic consequences detectable by the present
comparison of observed and simulated allele frequencies.
0 1995 Wiley-Liss, Inc.

Wide allele-frequency clines exist a t sev- gies of Mesolithic and Neolithic settlements
eral loci in Europe (Menozzi et al., 1978; indicate a westward spread of farming tech-
Sokal and Menozzi, 1982; Sokal et al., nologies from the Near East, starting ap-
1989a). Their extent is such that simple proximately 8,000 BC (Ammerman and Cav-
models of isolation by distance are unlikely alli-Sforza, 1984; Renfrew, 1987). The
to explain them. It is generally agreed that observed gene-frequency gradients can then
they result from a population expansion be explained by attributing the propagation
starting in Anatolia approximately 10,000 of farming t o the dispersal of early farmers,
years ago (Menozzi et al., 1978; Sokal and
Menozzi, 1982; Renfrew, 1987, 1991, 1992;
Cavalli-Sforza, 1988; Sokal et al., 1991),
Received January 25,1994; accepted August 7,1994
most likely associated with the development
Address reprint requests to Robert R. Sokal, Department of
of technologies for food production (Hassan, Ecology and Evolution, State University of New York, Stony
1973; Zeven, 1980). Radiocarbon chronolo- Brook, NY 11794-5245.

0 1995 WILEY-LISS, INC.


110 G . BARBUJANI ET AL.

who interbred only sparingly with the bands groups whose current languages do not be-
of hunters and gatherers whom they met in long to the Indo-European phylum.
the process, and who had already colonized Recent years have seen both a refinement
most of Europe (Ammerman and Cavalli- of the hypotheses on Indo-European origins,
Sforza, 1984). Such a combination of demo- and the emergence of contradictory data.
graphic growth, range expansion, and lim- Based on archaeological and linguistic evi-
ited admixture has been termed demic dence, Cavalli-Sforza (1988) and Renfrew
diffusion (Menozzi et al., 1978). Its expected (1991, 1992) argued that the common ele-
consequences include correlations between ments recognized within the so-called Nos-
allele frequencies and the dates of onset of tratic linguistic macrofamily (Kaiser and
agriculture (Sgaramella-Zonta and Cavalli- Shevoroshkin, 19881, including Indo-Euro-
Sforza, 1973), which have actually been ob- pean, Altaic, Afro-Asiatic, and Elamo-Dra-
served (Sokal et al., 1991). vidian, derive from a common biological ori-
The European regions where early (Neo- gin of most of their current speakers. It is
lithic) agriculturalists expanded correspond then possible to interpret the current distri-
approximately to the current western range bution of most Nostratic, and not only Indo-
of Indo-European languages. This raises the European, languages as a consequence of a
question whether the cultural process of multidirectional spread of agriculture. Re-
Indo-European diffusion was also deter- cent genetic analyses agree with this view
mined by the demographic processes accom- (Cavalli-Sforza et al., 1993; Barbujani and
panying the spread of farming (Renfrew, Pilastro, 1993). On the other hand, a model
1987). The traditional view holds, on the incorporating the effects of the origin of agri-
contrary, that proto-Indo-European entered culture and/or specifically the hypotheses of
Europe not earlier than 4500 BC, through Renfrew and Gimbutas failed to explain a
three migrational waves also coming from larger fraction of the correlations between
the East, namely from the Pontic steppes genetic and linguistic distances than is ex-
(Gimbutas, 1979, 1986). Although directed plained by the simple effects of geographic
westwards, like demic diffusion, these distances (Sokal et al., 1992).
waves were not associated with population Further evidence in favor of either hy-
increases comparable to those caused by the pothesis may be obtained by simulating
introduction of farming and animal breed- their genetic consequences, and then com-
ing. Therefore, diffusion of Indo-European paring them with the patterns of genetic
in the late Neolithic would imply that lan- variation observed in the field. In the only
guages spread more through cultural con- simulation study available so far, present-
tacts (Zvelebil and Zvelebil, 1988) than by time genetic variation was demonstrated to
demographic processes (see Renfrew, 1989). be compatible both with a Neolithic origin of
As a corollary t o this, association between Indo-European speakers, and with later im-
patterns of genetic and linguistic variation migration from the Pontic steppes (Rendine
should be limited and occasional among con- et al., 1986). However, the two hypotheses
temporary Indo-European speakers. were not contrasted in that study, but com-
Support for the view linking the spread of bined. A relationship between current gene-
proto-Indo-European in Europe with demic frequency gradients and dispersal from the
diffusion comes from studies showing that east was evident, and seems undisputable.
patterns of linguistic and genetic diversity However, when exactly, and by what type of
correspond in many European populations process, proto-Indo-Europeans spread has
(Sokal, 1988; Cavalli-Sforza et al., 1988, not been convincingly ascertained. We are
1992; Harding and Sokal, 1988; Barbujani particularly interested in establishing if the
and Sokal, 1990, 1991; Bertranpetit and demic diffusion of early farmers can, at least
Cavalli-Sforza, 1991). The exceptions in- in principle, explain a large share of current
clude Basques (Piazza et al., 1988; Bertran- genetic diversity among Indo-European
petit and Cavalli-Sforza, 19911, Hungarians speakers, or if additional processes must be
(Barbujani et al., 1990), and Uralic-speakers included in the model to obtain a better fit
(Guglielmino et al., 19901, that is to say, with real data. Among these additional pro-
INDO-EUROPEAN ORIGINS 111
cesses, we gave a special emphasis to the models lie in parameters other than those
migratory waves postulated by Gimbutas. described so far.
In this study, we simulated microevolu- Each simulation experiment consisted of
tionary scenarios of increasing complexity, 440 iterations (representing generations) of
from an unrealistically simple one to models a series of population processes. In this way,
including several archaeologically docu- assuming a 25-year generation interval and
mented migrations. We then calculated cor- non-overlapping generations, each experi-
relation coefficients between the simulated ment spans 11,000 years. The simulated al-
and real gene frequencies (or, more pre- lele frequencies were printed at generation
cisely, between matrices of genetic distances 440, representing present time, for the local-
calculated from each of them). We expected ities corresponding to those for which allele-
to observe an increasing agreement between frequency data are available in a database of
real and simulated data as the models get European allele frequencies (Sokal et al.,
more and more realistic. When an increase 1989a), from which non-Indo-European
in the complexity of the model is not speakers had been discarded. Matrices of
matched by an increase in the correlations, Prevostis genetic distances (Prevosti et al.,
we conclude that the new factors included in 1975) were calculated on both real and sim-
that model do not improve our understand- ulated allele frequencies, and their degree of
ing of the phenomenon, and should there- resemblance was evaluated by Mantel
fore be considered unnecessary. This does (1967) tests of matrix correlation.
not automatically imply that these factors Five FORTRAN programs were written,
played no evolutionary role at all. But, if each corresponding to one of five models for
they did, evidence of their effects should be the origins of Indo-Europeans described in
sought in data other than the currently the following section, and incorporating sub-
available allele frequencies. A large data- routines developed by Press et al. (1986)and
base of European allele frequencies (see Manly (1991).
Sokal et al., 1989a) was analyzed for this
purpose. An outline of the models
IBD: Isolation by distance
METHODS AND DATA
The first, clearly oversimplified, hypothe-
Overview of the simulations sis, is that Indo-European-speaking popula-
We carried out a series of computer simu- tions evolved under conditions of isolation
lations of five microevolutionary models. All by distance (IBD model). Current patterns
models were based on a stepping-stone pop- of genetic variation would then simply re-
ulation structure, consisting of a 60 X 37 sult from the interaction between random
regular lattice, superimposed onto the map fluctuations of allele frequencies in time,
of Europe. Each node of the lattice rep- i.e., genetic drift, and dispersal of individu-
resents a l-degree square quadrat. Of the als. Under isolation by distance, variations
2,220 nodes of the lattice, 1,512 are land in population size affect only the impact of
areas supporting a human population. Each genetic drift-the larger the population, the
node (population) is characterized by its ef- smaller the allele-frequency fluctuations.
fective size (N,) and by the frequency of one Population growth, which occurs after gen-
allele (p).At each generation p undergoes eration 40, does not prompt migratory move-
random variation, representing the effects ments. Thus, the IBD model neglects all
of genetic drift, which is a function of N,. gene flow processes other than those in
Migration is allowed only between adjacent which movements of individuals from their
populations. The numbers of individuals mi- birthplaces are local and random (i.e.,
grating at each generation depend on popu- equally likely in all directions except for mi-
lation sizes, and on factors of resistance to gration resistance factors, see below).
migration, which are zero across plains, but Under the IBD model, the demographic
greater than zero across mountain chains increase that occurred in the Neolithic (and
and seas. The differences among the five is detailed in the section Population Growth
112 G. BARBUJANI ET A L

Among Farmers) was simulated without OAC: Isolation by distance, plus effects
separating hunting-gathering and farming of the origin of agriculture, and cultural
populations, i.e., as if all hunter-gatherers transmission
turned to agriculture a t 8,000BC.
Cultural transmission from farmers to
hunter-gatherers may be built into the
model, yielding what we call OAC; C stands
OAG: Isolation by distance plus effectsof for culture. Under this model, at all locali-
the origin of agriculture ties some hunter-gatherers learn how to
produce food, and therefore their alleles are
Isolation by distance is the null hypothe-
transmitted across generations with greater
sis for human microevolution (Wijsman and
efficiency. From the genetic standpoint, this
Cavalli-Sforza, 1984). Therefore, all models
is equivalent to a certain degree of admix-
that follow are not alternative to IBD.
ture, whereby some genes of the hunter-
Rather, they incorporate it as a necessary, if
not sufficient process, for determining the gatherers contribute to the gene pool of the
farmers. As a consequence, these genes
currently observed patterns of genetic varia-
spread at once with the genes of the farmers,
tion. OAG, the simplest such model, is one
and are thus carried into new localities. This
which combines IBD with the likely effects
is the Neolithic demic diffusion model, as
of the demographic processes following the
originally proposed (Menozzi et al., 1978;
origin of agriculture. Under this model, pop-
Renfrew, 1987).
ulations of hunter-gatherers initially occupy
Europe and evolve under isolation by dis-
tance. At a specific moment in time (8,000
BC, chosen on the basis of archaeological in-
ATC: Isolation by distance, plus effects
formation), a few populations in southern of the origin of agriculture, cultural
Anatolia turn to farming. This starts a local transmission, and archaeological
process of population growth in the areas time constraints
where farming is being practiced, followed
by dispersal outwards when local population Under OAC, the spread of farmers from
densities have reached a certain threshold. Anatolia into Europe is driven by their in-
In this way, migratory movements between crease in numbers at each locality, which
farming communities are not necessarily causes dispersal towards areas of lower pop-
symmetrical, as is reasonable to assume in ulation density. Therefore, in the OAC
many evolutionary scenarios (Rogers and model, the farming technologies spread a t
Jorde, 1987). The rate of spread of farmers is an approximately constant rate through
driven by their intrinsic growth rate; it is space (as in Ammerman and Cavalli-Sforza,
constrained only by geographical factors 1971). This is known to be an approximation
such as mountain chains or bodies of water. (Barker, 1985).A further refinement of OAC
No cultural transmission is simulated be- considers archaeological time constraints
tween the hunter-gatherers and the farmers (ATC). Under ATC, we use archaeological
who immigrate into their regions. Only the information about the likely date at which
farmers allele frequencies are eventually farming reached each specific site in Europe
compared with observed matrices of genetic (see Sokal et al., 1991). In this way, the
distances. In this way, the genetic conse- arrival of farmers into a new locality re-
quences of this model are those that would flects archaeologically documented cultural
be expected if hunter-gatherers were re- transformations; farmers spread at an irreg-
placed without admixture, i.e., became ex- ular rate, corresponding to the actual pro-
tinct. The only microevolutionary role they cess as inferred from archaeological evi-
play is to serve as a source population at the dence. Incorporation of hunter-gatherers
beginning of the Neolithic, for that small into each farming population occurs at the
fraction in Anatolia of the total population same rates and through the same processes
that develops the new farming technologies. as in OAC.
INDO-EUROPEAN ORIGINS 113
GIM: Isolation by distance, plus effects of ing the frequency of an allele at a polymor-
the origin of agriculture, cultural phic locus in the hunting-gathering popula-
transmission, archaeological time tion, i.e., in the only type of population
constraints, and late Neolithic existing at the beginning of the simulation.
migmtions From a mathematical standpoint, it does not
make a difference if this locus is regarded as
In the OAG, OAC, and ATC models, the biallelic, or if it is considered multiallelic,
first farmers are also considered the first
since the fate of only one of its alleles is
speakers of proto-Indo-European. The alter-
simulated in the followingphases. Allele fre-
native hypothesis considers them as the Ne-
quencies were drawn from a gamma distri-
olithic inhabitants of areas that were later
bution truncated at one (Nei, 19871, whose
invaded by the first proto-Indo-European
mean was fixed either a t 0.33 or at 0.50.
speakers, the Kurgan people (Gimbutas,
1979, 1986). The three migrational waves
postulated by Gimbutas are added to ATC in Initial population sizes
the GIM model, by simulating long-distance Estimates of population densities among
migratory movements between 4,250 BC and current hunting-gathering tribes suggested
2,900 BC. A number of successive population to Rendine et al. (1986) that the effective
movements are added as well; presumably, size NHG of the hunting-gathering popula-
they were independent from the spread of tions of Europe should be approximately 300
Indo-European, but are considered by Gim- in each of the 840 elementary areas of their
butas (personal communication) relevant to simulation. To have the same population
an accurate description of human evolution density in the 2,220 pixels of this simula-
in Europe. tion, NHG was fixed at 114. This corresponds
to a population density of 0.04 individuals
Details on the simulation parameters per square km,within the estimated range
and algorithms of population densities for hunter-gatherers
in temperate climates (Hassan, 1981). In
The data matrix Rendine et al.s (1986) model, the individu-
In all simulation cycles, a matrix of 60 als were considered as haploid, whereas
columns by 37 rows was defined, each ele- here they are diploid. This may have caused
ment in the matrix representing a square of a certain degree of divergence between the
edge length 1 degree in a Mercator projec- two models, as the drift variances are af-
tion of Europe. The data matrix covers the fected by the levels of ploidy of a population.
area between 10 degrees of longitude West
and 50 degrees East, and between 72 and 35 Genetic drift among hunter-gatherers
degrees of latitude North. Iceland is not in- Non-overlapping generations were simu-
cluded in this simulation. lated. At each generation, a new allele fre-
quency was drawn, for each locality, from a
Geography normal distribution whose mean was the al-
lele frequencyp of the same population a t
Each of the 2,220 elements (= nodes or the previous generation, and whose vari-
pixels or localities) of the data matrix con- ance w a s p 0 - p), divided by twice the effec-
tained an integer value, L, which was 1 for tive population size NHG (Nei, 1987). This
plains, 2 for mountains, 3 for seas, and 4 for represented the effect of sampling of alleles
the Black Sea. A local population was as- from one generation to the following, i.e.,
signed to each of the 1,512 land pixels. random genetic drift.

Initial allele frequencies Dispersal of hunter-gatherers


Under all the models tested, for each land Symmetrical dispersal occurred between
pixel a variable PHG was defined, represent- adjacent pixels, once every generation fol-
114 G. BARBUJANI ET AL.

TABLE 1. Factors of resistance to migration (RTW tion 40; in this case, the number of migrants
And an adjacent Between a pixel in the- per generation increased accordingly. When,
pixel in t h e Plains Mountains Sea Black Sea by contrast, hunter-gatherers decreased in
Plains 0.00
numbers owing to the expansion of farmers,
Mountains 0.25 0.50 NAB) decreased as well until it reached
Sea 0.45 0.70 0.90 zero. Because of the RTM factor, two locali-
Black Sea 1.00 1.00 1.00 1.00
ties in the plains exchanged freely one-
fourth of the migrants allowed at each gen-
eration, whereas dispersal was reduced
lowing drift; population sizes of the two lo- between populations separated by moun-
calities exchanging individuals did not tains or bodies of water, and for populations
change as a result of dispersal. Therefore, a t the extremes of the simulated area. No
this study assumed a stepping-stone model dispersal was allowed across the Black Sea,
(Kimura and Weiss, 19641,whose properties to more carefully represent the population
are discussed by Jorde (1980). We chose a processes in the surrounding area (e.g., the
dispersal rate m of 0.065 (as in Rendine et migrations of Kurgan people), which oc-
al., 1986), which means that at each genera- curred mostly by land movements.
tion, after reproduction, 6.5%of the resident The allele frequency after migration,
pfHG, was then calculated as
individuals could be replaced by immigrants
from the adjacent pixels. However, physical -
obstacles in the pixels between which dis- P'HG = P N G [ ( ~- m') (1 - RTMI
persal occurred could reduce the number of + rn'PHGin (2)
migrants. Physical obstacles t o migration
where RTM is the average resistance to mi-
were expressed by a factor of resistance to
gration between the pixel of interest and the
migration, RTM, detailed in Table 1. The
nA adjacent pixels (1 < nA < 41, and pHGin
average value of RTM, calculated between
is
all suitable pairs of pixels, was 0.300. This
means that, on the average, only 70% of the
potentially dispersing individuals actually
moved from their birthplace to an adjacent
locality. To compensate for this, the dis-
persal rate m was replaced by m' = 0.0651 Inception of farming in the nuclear zone
0.700 = 0.0928. In other words, 9.28%of the
individuals in a locality were potentially Under all models except IBD, the spread
subject to migration elsewhere, and 6.5%ac- of farming starts with the splitting of some
tually migrated, on the average. The num- populations into two groups, one practicing
ber N(AB) of individual hunter-gatherers agriculture, and the other still living in a
moving from locality A t o B (and vice versa, hunting-gathering economy. At generation
from B to A) was 40 (i.e., 10,000 years ago), 20 individuals
turn to farming at each of six pixels in Ana-
tolia, around the village of Catal Humk,
N(AB) = NHGm' (1 - RTM(AB)) (1) where the oldest archaeological evidence of
4
~

farming activities is situated (Redrew,


1991). Each group of 20 individuals repre-
where the denominator refers to the number sents a random sample of the pre-existing
of adjacent populations in a stepping-stone hunting-gathering population at the same
model, and RTM(AB) depends on the envi- site; therefore, initially they have the same
ronmental features a t localities A and B (Ta- allele frequency: pF = PHG. However, from
ble 1).WhenNHGwas fixed at 114, each pair generation 41 they evolve in reproductive
of adjacent localities exchanged 2 effective isolation, so that there will be two distinct
individuals per generation. However, under populations of hunter-gatherers and farm-
the IBD model the hunting-gathering popu- ers, HG and F, at those localities, with dis-
lations increase in size, starting at genera- tinct allele frequencies. In the absence of
INDO-EUROPEAN ORIGINS 115
cultural transmission between groups, i.e., pansion towards localities a t the same
under the OAG model, the two groups coex- latitude (i.e., with the same type of climate)
ist without any genetic exchange, at all lo- was presumably more common than dis-
calities where farming communities have persal northwards, towards more rigorous
been established. climates. This led us to subdivide the history
of our simulated farming populations into
Population growth among farmers four successive phases. Phase 1: Initially,
the population is scarce, and simply tends to
The farming populations have access to a increase logistically, without sending emi-
wider range of resources, and tend to in- grants, but receiving immigrants from other
crease in numbers (Hassan, 1973); 50-fold farming populations a t higher density, if
increases have been estimated by Ammer- any. Phase 2: When NF reaches a first
man and Cavalli-Sforza (1984). We simu- threshold, T1, a few individuals begin to dis-
lated a logistic increase, whose key parame- perse longitudinally; this often entails colo-
ter, the growth rate, generally referred to as nization of a new site on the west, whereas
r (Feller, 1940; Eisen, 1979; Keyfitz, 1977), gene flow is asymmetrical with the eastern
was here called a,following Rendine et al. neighbors, whose density is still higher.
(1986). In the absence of reliable informa- Phase 3: As the population size approaches
tion on growth rates among early farmers, the carrying capacity of 7,560, a second
we tried a preliminary set of values, and threshold, T2 is reached, and gene flow oc-
chose a = 0.5, which gave us a rate of popu- curs also northwards and southwards, once
lation increase compatible with the known again giving rise to a new population of
rates of spread of farming in Europe, 1 km farmers if the adjacent pixel to the north or
every year, on the average (Ammerman and south has not been colonized yet. The
Cavalli-Sforza, 1971). That value was em- thresholds had to be fixed in a somewhat
ployed in all the simulations presented here. arbitrary manner. It seemed realistic to al-
The equation calculating, at generation t low for the first westwards dispersal two to
and for each locality, the effective size NF of three generations after inception of the
the farming population is farming economy, so as to roughly match the
archaeologically documented rates of
1) x (1 + a (1 - spread. T1 was fixed at 24 (corresponding to
a census size of 72 individuals), whereas for
T2 we chose 50% of the carrying capacity,
where NF(t - 1)is the population size a t the i.e., 3,780. Phase 4: Finally, when adjacent
former generation, and 7,560 is the carrying populations have reached their equilibrium
capacity of the area, chosen for reasons anal- size of 7,560 effective individuals, the migra-
ogous to those that led us to choose tory exchanges become symmetrical. The
NHG= 114. Since the effective population general equation, expressing the number of
size is approximately one-third of the cen- farmers moving, say, from locality A to adja-
sus size (Wright, 19691, this corresponds cent locality B, N(AB),is
roughly to a farming population of 23,000
dwelling on a 1-degree-square quadrat of
land, and to 20,000 farmers in each elemen-
tary area of Rendine et al.s (1986) simula-
tion. for NF(A) greater than the appropriate
threshold, i.e., T1 for latitudinal and T2 for
Dispersal of farmers and origin of longitudinal movements, respectively. All
farming outside the nuclear zone relevant quantities have been defined for
Equation 1. Since N,(A) is variable across
Under the OAG and OAC models, the generations, N(AB) varies too.
growth of farming populations prompts mi- In this way, the input and output of genes,
gratory movements into neighboring locali- from and to the adjacent pixels of the map,
ties where farming has not yet started. Ex- could be different a t different times during
116 G . BARBUJANI ET AL.

the simulations. If the number of immi- tion 200, however, the four models yield a
grants from each locality is taken into ac- similar pattern of land occupation. The only
count in the calculation of the allele fre- major exception is an area of north-western
quency of the immigrants, which we called Alps, where archaeological evidence shows a
pHGinin Equation 2 and which will be called delayed onset of farming activities (Sokal et
pFinfor farmers, a n analogous formula gives al., 1991). Of course, such a delay was not
the allele frequency after gene flow among predicted by the mechanism of farming
farmers: expansion underlying our OAG and OAC
models.
p'F = p F [ ( l - m ' ) (1 - RTM)I
+ rn'PFin (5) Pixels in the sea
Figure 1 is an example of allele frequencies A stepping-stone model does not allow for
generated under the OAG and OAC models. long-distance population movements, i.e.,
Allele frequencies of specific localities were, those associated with sailing, which are con-
of course, different in different realizations sidered important in the colonization of Eu-
of the same process, and between OAG and rope, and in the successive phases of agricul-
OAC. What was constant, however, was the tural dispersal (Renfrew, 1987). To simulate
pattern of occupation of land areas by ex- the effects of the movement of a few individ-
panding farmers, because it depended only uals across the seas, we chose to assign a
on population growth and dispersal parame- pseudopopulation to each pixel located in
ters, which were kept constant across real- the sea. Pseudopopulations did not undergo
izations. random fluctuation of allele frequencies,
For the ATC and GIM models, by contrast, and did not increase in numbers. They in-
the spread of farming followed the pattern cluded only the individuals dispersing from
that can be inferred from archaeological evi- neighboring pixels (their number was deter-
dence. Once a farming community exists at mined according to Equation 1)) whose de-
a certain locality, the exchange of genes with scendants had the same allele frequencies,
neighboring farming communities occurred and a t each generation proceeded one step
in the same manner as described for the forward in the dispersal process. In this
OAG and OAC models (Eq. 5). The differ- way, the movement of a few individuals
ence is in the establishment of new farming across the sea was simulated. This had little
populations, which under ATC and GIM was importance for the allele frequencies of
controlled by a matrix of dates of origin of
agriculture at each land pixel of the map
(details on how archaeological information
was processed for this purpose are in Sokal Fig. 1. Spread of farmers under the OAG and OAC
models. The localities where farming is being practiced
et al. (1991)). Therefore, when colonization are indicated by letters representingallele frequency in
of a new locality by the first farmers had farmers, at generations 100 (6500 BC), 200 (4000 BC), and
to be simulated, eight effective founding 240 (3000 BC). Eight allele-frequencyclasses are defined,
individuals were sampled with replace- from a to h, each corresponding to an interval equal to
ment from the closest suitable locality. In a 0.125 (a, p F < 0.125; b, 0.125 <p,0.250; c, 0.250
< p F < 0.375; etc.). Hyphens represent areas inhabited
few cases, this required input of immigrants only by hunter-gatherers.The nuclear zone is delimited
from localities that were not directly adja- by a solid square. While the pattern of land occupation
cent to the one in which farming was start- shown was constant for all the realizations of the mod-
ing. This was the only violation that we els, the allele frequencies depicted are those of a single
tolerated of the assumptions of the step- run of OAC.
Fig. 2. Spread of farmers under the ATC and GIM
ping-stone model. Figure 2 is an example models. The localities where farming is being practiced,
of allele frequencies generated under the based on archaeological information, are indicated by
ATC and GIM models. It shows that the figures representing allele frequency of farmers, at gen-
spread of farmers is initially slower than erations 100 (6500 BC), 200 (4000 BC), and 240 (3000 BC).
Allele-frequencyclasses are as in Figure 1. The pattern
simulated under the OAG and OAC models, ofland occupationwas again the same in all realizations
especially along the northern shores of of ATC and GIM, but the allele frequency shown is that
the Mediterranean Sea. By genera- of a single realization ofATC.
1
t
118 G. BARBUJANI ET AL.

hunter-gatherers and for those of farmers Rendine et al. (1986), the value of y was
once the spread of farming through Europe adjusted so that resultant values of S had
was completed; however, it had an effect on approximately the same magnitude as in
the establishment of farming communities Equation 6. However, we found, over a
under the OAG and OAC models, as a few range of test conditions, that the results
individuals could quickly reach distant lo- were only trivially affected. Therefore, we
calities by sea. The resulting pattern of occu- report results for Expression 6 without redo-
pation of coastal regions corresponds well ing the analysis for every test condition.
with the archaeological evidence (see Figs. 1
and 2). Disappearance of hunting-gathering
Cultural contacts and admixture populations

Under the OAC, ATC, and GIM models, at In the IBD model, hunter-gatherers adopt
each generation a certain number of hunter- farming at generation 40, and thus all genes
gatherers adopted farming, if a farming of Indo-European speakers come from the
community already existed a t the locality genetic pool of the hunting-gathering com-
where they lived. The likelihood of this cul- munities. In the OAG model, conversely, the
tural shift depended on the probability of hunter-gatherers go extinct, and thus all
contacts between farmers and hunter-gath- genes of Indo-European speakers derive
erers, and on a coefficient of acculturation from the genes of the few first farmers of
which was called y by Rendine et al. (1986). Southern Anatolia. These are the two ex-
The number S of individuals shifting to treme models, as far as the origins of Indo-
farming at each generation is related to the Europeans are concerned. Under the other
probability of contacts between farmers and models, a certain degree of admixture is sim-
hunter-gatherers. If farmers are NF at a ulated between the two communities at each
given locality and a t a given moment in locality, reflecting a widespread view of hu-
time, their probability of meeting one of the man evolution in Europe (Cavalli-Sforza
NHG hunter-gatherers will represent a frac- and Piazza, 1993).
tion equal to 2(NF x NHG) of all the Under the OAC, ATC, and GIM models,
(NF+ NHG)' possible contacts. The proba- the hunters and gatherers are considered to
bility y that such a contact will result in disappear from a certain locality when their
acculturation has been estimated at 0.00024 number is such that S is less than 1. Be-
(Rendine et al., 1986). Therefore, at each cause acculturation starts after a phase of
generation, the NF farmers will transmit population buildup for farmers, the extinc-
their technologies t o a number S of the tion of the hunting-gathering communities
hunter-gatherers estimated as also proceeds as a wave, from southeast to
northwest, spreading in parallel with the
farming economy, but several generations
later.
Long-range migratory movements
where all parameters have already been de- In the GIM model, three major migratory
fined. waves of Kurgan people are supposed to
It can be argued that the change in a par-
have introduced Indo-European languages
ticular quadrat of the number of hunter- into Europe, around 4,250 BC, 3,400 BC, and
gatherers per generation might better be 2,900 BC, respectively (Gimbutas, 1979,
modelled as proportional to the product of 1986).
the number of hunter-gatherers and the In Gimbutas' (1979) view, the westward
number of farmers, that is migrations of Kurgan people in Europe in-
S = YNHGNF (7) troduced a new patriarchal culture, charac-
terized by horse-riding and new warfare
where y = 1.56250 x In this expres- techniques. These cultural changes were not
sion, closely resembling a formula used by associated with major innovations of the
INDO-EUROPEAN ORIGINS 119
subsistence techniques. Most of the popula- parameters being constant. Gene flow is re-
tions of Europe by then were farmers. It is duced at generation 265, i.e., 3,375 years
therefore highly unlikely that concomitant ago, by which time all suitable regions had
population growth could occur. We chose to been colonized by early agriculturalists. The
represent these waves as a flow of genes parameters of the simulation are summa-
from the purported source area, north of the rized in Table 2.
Black Sea and west of the Caspian Sea. For
the sake of simplicity, these movements
were concentrated in one generations time Gene-frequency data
(at generations 190,224, and 244), although The simulated sets of allele frequencies
each of them probably lasted two centuries were compared with a database of allele fre-
(Gimbutas, 1979). For each movement, we quencies, which had been analyzed in vari-
simulated replacement of 20% of the genes ous studies on Europe (Sokal et al., 1988,
in the target area, with genes coming from 1989a,b, 1990, 1991, 1992; Harding and
the source area. Probably this overempha- Sokal, 1988; Barbujani and Sokal, 1990;
sizes the genetic consequences of the simu- Sokal, 19911, and had been continuously up-
lated migratory movements. dated. The data corresponding to popula-
The invading Kurgan people were not the tions speaking languages other than Indo-
entire population of the area between the European (Basque, Finnish, Estonian, Lapp,
Black and Caspian sea moving en masse; Hungarian, Turkish: Ruhlen, 1987)were dis-
rather, they were groups of individuals be- carded.
longing to semi-nomadic tribes (Gimbutas, Twenty-six genetic systems were consid-
1979). Accordingly, we chose to simulate ered. Most of them corresponded to indepen-
their contribution to the genetic pool of thedent loci; exceptions are ABO, MN, and Rh,
invaded populations as if they were com- for which two (or three, for Rh) systems were
ing from several populations in the appro- independently considered, each resulting
priate zone. The location of four such popu- from typing of alleles by different sets of an-
lations was chosen a t random, and then held tisera. This convention has long been fol-
lowed in studies on human variation (e.g.,
constant in all 2,600 simulation cycles of the
GIM model. The allele frequencies of the re-see Lewontin, 1972). Each system is indi-
cipient populations (Fig. 3) were then recal-
cated by a letter code, preceded by a number
culated as if 20% of the pre-existing indi- referring to Mourants coding system (Mou-
viduals had been replaced by immigrant Kur- rant et al., 1976), except for 100HLA-A,
gan people. 101/2HLA-B, 200GM, and 201Kh4, whose
In addition, Gimbutas (personal commu- numerical codes were assigned in our lab-
nication) pointed out to us 12 other direc- oratory. Overall, 3,481 records, and 93 alle-
tional and potentially migratory processes les or haplotypes were considered.
that may have been important in determin- The number of samples available for the
ing the current linguistic population struc-26 systems varied widely, ranging from a
ture of Europe. These processes are summa- minimum of 27 (for 5-1 LUTHERAN), to a
rized in Figure 4, and were incorporated in maximum of 762 (for 1-1ABO).Genetic dif-
the GIM model. Once again, for each of them ferences between localities were summa-
we simulated replacement of 20% of the in- rized by 26 matrices of Prevostis distances
dividuals of the target area by individuals
(Prevosti et al., 19751, separately calculated
whose allele frequency was the average al- for each system. We shall refer to these ma-
lele frequency in the sourcearea. trices as observed distance matrices, as op-
posed to the simulated ones, generated by
Decrease of population mobility after the computer under one of the five models
establishment of a farming economy tested. To properly compare the two sets of
Once farming populations have reached data, prior to calculating genetic distances
the maximum size allowed by the programs, we pooled the observed frequencies of all al-
a reduction of mobility is simulated by sim- leles except the one whose average fre-
ply halving the migration rate m, all other quency was closest to 0.5.
120 G. BARBUJANI ET AL.

Fig. 3. Migratory movements simulated under the Gimbutas are surrounded by solid lines; the zone where
GIM model. The four localities whose allele frequencies Basque is currently spoken is not supposed to have been
are averaged, to represent the allele frequencies of the affected by these migratory episodes. Figures refer to
migrating population, are marked by asterisks. The re- the generation at which each migratory wave was simu-
gions affected by the three migratory waves proposed by lated.

Hypothesis testing adds OAG movements to those already re-


For every one of the 26 genetic systems, quired by the IBD model, but random allele-
each of the five simulation programs was frequency fluctuations are the same. This is
run 100 times, yielding 13,000 simulation also true for the other models, each incorpo-
cycles (or realizations) in all, 2,600 for each rating the previous ones. For all models,
model. Every realization resulted from de- then, random change in allele frequencies
terministic movements of individuals across was the same, for each pixel and each gener-
the map of Europe, and random allele-fre- ation, under all five models. It was not the
quency fluctuations occurring during ini- same, however, for the hunting-gathering
tialization and from genetic drift each gen- and for the farming populations of the same
eration. The latter were dictated by a pixel. Thus, the 500 cycles of simulation for
random number generator, and gave rise to each locus fall naturally into 100 groups,
the various replicates at generation 440. A each with five matched runs, in ascending
particular realization of the OAG process model order. Because of the matching, we
INDO-EUROPEAN ORIGINS 121

Fig. 4. Twelve potentially important migratory processes considered by Gimbutas (personal commu-
nication) to have been relevant in European ethnohistory, each one represented by an arrow from a
sourceregion to a targetregion. Figures refer to the generation at which each population movement
was simulated.

could use paired statistical tests to compare chosen for consistency with previous studies
models. This provided a considerable in- (Sokal, 1988; Sokal et al., 1993).
crease in statistical power over unpaired Simulated and observed matrices were
comparisons. then compared pairwise by means of Man-
After 440 generations in each computer tels test of matrix association (Mantel,
cycle, simulated allele frequencies were 1967; Smouse et al., 1986). This test com-
sampled from localities chosen so as to putes the equivalent of a correlation coeffi-
match the locations of the samples of the cient between matrices, and evaluates its
observed allele-frequency database. Matri- significance by constructing a null distribu-
ces of Prevostis distances were computed tion of the test statistic. A Monte Carlo pro-
from the simulated data, so as to obtain 100 cedure is employed for this purpose; rows
simulated matrices for each matrix of ob- and columns of one matrix are repeatedly
served genetic distances (Prevosti et al., permuted at random, while the other matrix
1975). This measure of genetic distance was is kept constant, and the test statistic is re-
122 G. BARBUJANI ET AL.

TABLE 2. Parameters defined in the simulation


L Environment: 1 = plains; 2 = mountains; 3 = seas; 4 = Black Sea
NHG Effective population size of hunter-gatherers (114in all models but IBD, where it is allowed to
increase up to 7560. In the OAC, ATC, and GIM models it is then reduced as an effect of the
cultural transmission of farming technologies)
NF Effective population size of farmers (Initially equal to 0, then allowed to increase up to 7560 in all
models but IBD)
PHG Frequency of one allele among hunter-gatherers (initially sampled from a gamma distribution)
PF Frequency of one allele among farmers for all models except IBD (Initially undefined. For the six
pixels of the nuclear zone, pF = pHG a t generation 40. In the other pixels, the initial p F value
reflects the proportion of immigrating farmers and of hunter-gatherers who were incorporated into
the farming population, under the different models)
a Intrinsic growth rate of the farming populations
Y Acculturation rate, i.e., rate of assimilation of hunter-gatherers by farmers
m Migration rate
RTMAB)
Resistance to migration between localities A and B (see Table 1)
RTM Average resistance to migration between one pixel and its adjoining pixels

calculated each time, so a s to yield the de- relation coefficients, and only positive dif-
sired null distribution. ferences (indicating a n improvement of the
Because each observed matrix was com- fit for the more complex hypothesis) were
pared with 100 simulated matrices for each considered significant; 2) Wilcoxons signed
model tested, a procedure was needed to rank test (Sokal and Rohlf, 19951, a nonpa-
combine all this information. We chose to rametric paired-comparisons test, once
compute average Mantel correlation coeffi- again considering significant only the cases
cients, and to calculate Fishers combined in which the fit improved for the more com-
probabilities (Sokal and Rohlf, 1995) from plex hypothesis.
the 100 individual probabilities for each
model. RESULTS
Since we are looking for positive associa- The average Mantel correlations for each
tion of observed and simulated data, and a genetic system (Table 3) yield numerous sig-
negative correlation would have no biologi- nificant agreements between observed and
cal meaning, all tests of significance were simulated matrices of genetic distances, for
one-tailed. However, a s a further control, we all models, although fewest with IBD. The
also counted the number of occurrences of number of significant ( P < 0.05) positive av-
negative correlations that would be signifi- erage correlations is maximal for the ATC
cant if the test had been two-tailed. This model (20/26), but it is not substantially
allowed us to identify the models generating lower for OAG, OAC, and GIM (respectively,
allele-frequency distributions departing 18, 17, and 16 systems). For IBD it is only
widely from the observed distributions. 10/26. The numerical values of the correla-
The main purpose of this study was to tions are low despite their high level of sta-
compare competing hypotheses on the origin tistical significance. This is characteristic
of Indo-Europeans. Because the hypotheses for genetic distances and is because the
can be ranked, by increasing complexity, Mantel correlations were constrained to be
from IBD through OAG, OAC, ATC, and linear.
GIM, four painvise tests of goodness of fit Next we examine the number of cases in
were carried out, OAG versus IBD, OAC ver- which individual simulation realizations
sus OAG, ATC versus OAC, and GIM versus gave genetic distance matrices that would
ATC. This was done taking advantage of the appear negatively associated (P s 0.05)
paired design based on the same random with the observed one, if the test had been
seeds in the simulations. We employed two two-tailed. A substantial number of dis-
different procedures: 1) a paired-compari- agreements, between observed and simu-
sons t-test, where the test statistic was the lated data is evident for IBD (6 at system
difference between the average Mantel cor- 4-13 RHESUS), for ATC (8 in the 1-1 ABO
INDO-EUROPEAN ORIGINS 123
TABLE 3. A) Average Mantel correlations of observed with simulated genetic distances and B) significance levels based
on ont-tailed probabilities for positive correlation combined by Fisher's method'
System IBD OAG OAC ~
An: GIM
A. Mantel correlations
1-1-mo 0.01294 0.14800 0.15233 0.13137 0.01887
1-2x30 0.05699 0.10831 0.09718 0.00411 0.03532
2-5-MN 0.00324 -0.05627 -0.05352 -0.04100 -0.03231
2-7-MN -0.02886 -0.02236 -0.02091 -0.03127 -0.03912
3-1-P -0.03558 -0.04556 -0.03411 0.02409 0.01112
4-1RHESU 0.01311 -0.01457 -0.01428 0.00674 0.03130
4-13RHES -0.02547 0.06920 0.06120 0.03431 0.02375
4-19RHES -0.04258 -0.03158 -0.02899 -0.01635 -0.08314
5-1LUTH -0.00590 -0.05392 -0.04768 0.00777 0.02033
6-1-KELL 0.01789 0.07960 0.08378 0.03392 0.03330
6-3-KELL -0.04254 0.06145 0.01510 -0.01550 -0.06665
7-1ABHSE 0.01781 0.01064 0.02805 0.05623 0.05420
8-1DUFFY 0.00290 0.04017 0.05362 0.01197 -0.01077
36-1-HP 0,11991 0.15415 0.17941 0.07459 0.08002
37-1-TF 0.00592 0.08658 0.09142 0.08117 0.00461
38-1-GC -0.01547 -0.05739 -0.05843 -0.01637 0.05178
50-1-1AP -0.00965 0.12963 0.11903 0.10520 0.06548
52-PGD 0.04775 -0.03417 -0.03457 -0.04371 0.01603
53-PGM1 -0.00845 0.13174 0.14000 0.10789 0.06152
56-AK -0.00696 0.06676 0.10858 0.05081 -0.01301
63-ADA 0.00667 0.27423 0.26493 0.07537 0.03099
65-TASTE 0.02802 0.17960 0.16922 0.07373 0.04040
100HLA-A 0.01417 0.13654 0.15443 .0.08977 0.01311
101-102 -0.00854 0.24013 0.22784 0.12228 0.09136
200-GM 0.01455 0.34928 0.33330 0.14102 0.03480
201-KM -0.02816 0.07813 0.09279 0.05959 -0.03732
B. Significance levels
1-1-ABO 0.00000 0.00000 0.00000 0.00000 0.00000
1-2-ABO 0.00000 0.00000 0.00000 0.00489 0.00000
2-5-MN 0.00339 1.00000 1.00000 1.00000 1.00000
2-7-MN 1.00000 1.ooooo 1.00000 1.00000 1.00000
3-14' 1.00000 1.Ooooo 1.00000 0.o0000 0.00003
4-1RHESU 0.00000 1.00000 1.00000 0.00198 0.00000
4-13RHES 0.99975 0.00000 0.00000 0.00000 0.00000
4-19RHES 0.99996 0.99980 0.99913 0.00001 1.00000
5-1-LUTH 0.52240 1.00000 1.00000 0.22014 0.04162
6-1-KELL 0.47350 0.0oooo o.ooooo 0.00001 0.00385
6-3-KELL 1.00000 0.00000 0.09492 0.94677 1.00000
7-1ABHSE 0.00942 0.03488 0.00002 0.00000 0.00000
8-1DWFY 0.94188 0.00000 0.00000 0.01885 0.99931
36-1-HP 0.00000 0.00000 0.00000 0.00000 0.00000
37-1-TF 0.82795 0.00000 0.00000 0.00000 0.79836
38-1-GC 0.98856 1.00000 1.00000 0.51374 0.00000
50-1-1AP 0.94062 0.00000 0.00000 0.00000 0.00000
52-PGD 0.00554 1.00000 1.00000 0.99974 0.68047
53-PGM1 0.99819 0.00000 0.00000 0.00000 0.00000
56-AK 0.97077 0.00000 0.00000 0.00000 0.99442
63-ADA 0.05673 0.00000 0.00000 0.00000 0.00132
65-TASTE 0.00000 0.00000 0.00000 0.00000 0.00000
100HLA-A 0.00107 0.00000 0.00000 0.00000 0.06326
101-102 0.97945 0.0oooo o.ooooo 0.0oooo 0.00000
200-GM 0.04825 0.00000 0.00000 0.00000 0.00115
201-KM 0.99886 0.00000 0.00000 0.00000 0.99918
Overall probability 0.00000 0.00000 0.00000 0.00000 0.00000
'Values below 0.05 show significant positive correlation between simulated and observed genetic distances

and 11 in the 2-5 MN systems), and for GIM significant increase in the resemblance of
(26 in the 1-1 ABO, and 8 in the 4-19 observed and simulated genetic distances
RHESUS systems). occurs between IBD and OAG. In the t-tests
The painvise comparisons by t-tests and shown in Table 4 , 1 7 systems show a signifi-
by Wilcoxon's signed ranks test agree in cant increase of correlation,whereas in only
their means and significances that the most 7 cases does the similarity decrease. These
124 G. BARBUJANI ET AL

TABLE 4. Results of paired comparisons t-tests for differences between 5 Indo-European simulation hypotheses:
P ( H I ) - P (H2)
~ ~

System OAG - IBD OAC - OAG ATC - OAC GIM - ATC

1-1-ABO 0.13506*** 0.00432 -0.02095 -0.11250


1-2-ABO 0.05133*** -0.01113 - 0.09307 0.03121***
2-5-MN -0.05951 0.00275*** 0.01252** 0.00869*
2-7-MN 0.00650 0.00145 -0.01037 -0.00785
3-1-P -0.00998 0.01145*** 0.05819*** -0.01296
4-1RHESU -0.02767 0.00029 0.02102*** 0.02456***
4-13RHES 0.09467*** -0.00800 -0.02689 -0.01056
4-19RHES 0,01099 0.00259 0.01265 -0.06679
5-1-LUTH -0.04802 0.00624 0.05545*** 0.01256
6-1-KELL 0.06171*** 0.00419 -0.04987 -0.00061
6-3-KELL 0.10399*** -0.04634 - 0.03060 -0.05115
7-1ABHSE -0.00717 0.01741*** 0.02818* -0.00203
8-1DUFFY 0.03727*** 0.01345*** -0.04164 -0.02275
36-1-HP 0.03425** 0.02525** - 0.10482 0.00543
37-1-TF 0.08065*** 0.00484 -0.01026 -0.07655
38-1-GC -0.04192 -0.00104 0.04206*** 0.06816***
50-1-1AP 0.13928*** -0.01061 -0.01383 -0.03972
52-PGD -0.08192 -0.00040 -0.00914 0.05974***
53-PGM1 0.14019*** 0.00826 -0.03212 -0.04637
56-AK 0.07372*** 0.04182*** -0.05777 -0.06382
63-ADA 0.26756*** -0.00930 -0.18956 -0.04438
65-TASTE 0.15158*** -0.01038 -0.09549 -0,03333
100HLA-A 0.12237*** 0.01789*** -0.06466 -0.07666
101-102 0.24867*** -0.01229 -0.10556 -0.03092
200-GM 0.33473*** -0.01597 -0.19229 -0.10622
201-KM 0.10630*** 0.01466 -0.03320 -0.09691
Mean difference 0.07402 0.00198 -0.03662 -0.02660
Asterisks indicate significant positive differences as follows: *0.05 3 P 2 0.01,**0.01P P > 0.001,***0.001 P.

figures compare with 7 systems showing sig- although ATC shows a significant positive
nificantly improved correspondence and 10 correlation for the largest number of sys-
showing decreased correspondence for OAC tems, on the average, correlations between
versus OAG, 6 and 19, and 5 and 19, respec- observed and simulated genetic distance are
tively, for ATC versus OAC, and GIM versus higher for OAC and OAG.
ATC. These results are reflected in the To test the plausibility of our simulations
mean differences shown at the bottom of Ta- we also calculated FsTvalues (Wright,
ble 4. 1978) for both our observed gene-frequency
Using Wilcoxons criterion, the results do surfaces and the simulated surfaces. The
not change much, so we do not feature them median results over all genetic systems are
as a separate table. The numbers of systems 0.011780 for the observed surfaces, and
showing significance increased and any de- 0.098273, 0.10399, 0.096562, 0.056609, and
creased resemblance between observed and 0.003211, respectively, for the simulated
simulated data are, respectively, 17 and 7 surfaces of models IBD, OAG, OAC, ATC,
for OAG versus IBD; 10 and 9 for OAC ver- and GIM. Although the FsTvalues of the
sus OAG; 5 and 19 for ATC versus OAC; and observed surfaces overlapped only slightly
5 and 18 for GIM versus ATC. those of the simulated surfaces, the median
We conclude, as a result of all the tests, of the observed data falls within the bound-
that similarity between observed and simu- aries of the medians described by the mod-
lated genetic distances increases from IBD els. The latter fall into 3 groups by magni-
to OAG, and, to a lesser extent, from OAG to tude of FsT.These are 1)IBD, OAG, OAC; 2)
OAC. On the contrary, it decreases as mod- ATC; and 3) GIM. Thus, the clear superior-
els are tested in which the spread of farmers ity of OAG and OAC over IBD cannot be
is constrained by archaeological time data shown by FST since various patterns of local-
(ATC), or demographic processes occurring ity differentiation can yield the same F,,
in the late Neolithic are added (GIM). Thus value.
INDO-EUROPEAN ORIGINS 125
DISCUSSION Similarly, the pairwise comparison of
Which model matches observed models show a substantial increase of fit of
data best? the OAG over the IBD model (Table 41,
whereas the elements included in the ATC
The IBD model assumes that, in the Neo- and GIM simulations cause a slight but evi-
lithic, groups of hunter-gatherers and farm- dent departure from the patterns observed,
ers were not separated. The former gradu- making them poorer fits than OAG.
ally turned to farming, so that there was a A first conclusion one may draw from the
genetic continuity between pre- and post- results of this simulation study is that two
Neolithic populations in Europe, and Indo- models account best for many aspects of the
European languages spread only by cultural contemporary genetic structure of Indo-
transmission. Allele frequency patterns gen- European-speaking populations of Europe.
erated under this model resemble poorly the One is the demic diffusion model, as origi-
patterns of genetic variation observed in nally put forward by Menozzi et al. (1978),
contemporary populations, showing that and associated with linguistic evidence by
this evolutionary hypothesis does not fit Renfrew (1987). Under this model, here
with the available genetic evidence. called OAC, the two forces driving microevo-
Resemblance between observed and simu- lution in Europe were population growth de-
lated patterns is much greater for the other termined by farming, and dispersal accom-
four models, in which farmers evolve sepa- panied by limited population admixture
rately from hunter-gatherers, and processes between early agriculturalists (possibly pro-
of population expansion are important. The to-Indo-European speakers) and preexisting
levels of resemblance, however, do not differ hunters and gatherers. The other model,
much among these four models. For in- OAG, is a simplified version of the demic
stance, the GIM model, including several diffusion model, in which dispersal of farm-
population processes occurring in the last ers does not lead to any degree of admixture
5,000 years, does not give higher correla- with hunter-gatherers.
tions, or significant correlations at a higher
number of loci, than the OAG model, where How plausible is the OAG model?
the demographic changes prompted by the While the OAC model has already re-
origin of agriculture are simulated in a ceived support from studies focussing on its
much rougher manner. genetic (Sokal et al., 1991; Cavalli-Sforza et
Actually, various results of this simula- al., 19931, as well as linguistic and archaeo-
tion study indicate that the fit of simple logical, aspects (reviewed in Renfrew, 1992),
models, such OAG and OAC, is better than what we called OAG here has not been ana-
that of more complex models. The Mantel lyzed in detail so far. An apparent problem
correlations between observed and simu- with it is, how can a model not involving
lated genetic distances would be negative admixture account for the continent-wide
and significant in only 21 of the 2,600 cases clines observed in Europe?
for OAG, and in 14 cases for OAG, had tests Inspection of gene-frequency maps gener-
been two-tailed. These figures compare with ated in this study, at various moments in
89 and 105 negative significant correlations time, shows that founder effects are common
for ATC and GIM, respectively. It seems, while farmers disperse. Founder effects are
therefore, that nothing is added to our un- due to the limited numbers of individuals
derstanding of the phenomena, if we add ar- who start the farming communities in new
chaeological time data to constrain the localities. In the OAG model as we simu-
spread of Neolithic farmers, and even less so lated it, most farming communities start
if we simulate population movements in the with 8 effective individuals; but even if this
late Neolithic. The models where farmers number were larger, the probability for the
disperse into new areas simply because of allele in question to be lost or fixed would be
their numbers, which increase logistically, substantial. Loss of genetic variation
yield patterns showing a better agreement through repeated founder effects has been
with the observed data. invoked as the likely cause of clines in sev-
126 G. BARBUJANI ET AL.

era1 studies on natural populations of toads migrations of Kurgan people. This study
in Australia (Easteal, 1988) and aquatic in- cannot establish whether or not these mi-
vertebrates in Canada (Boileau et al., 1992). gration events really occurred, but, if they
Theoretical work on the genetic effects of occurred, they did not leave a significant
colonization of previously unoccupied locali- mark on the allele frequencies of current
ties (Wade and McCauley, 1988) agrees with populations.
this view. Renfrew (1987) argued that the cultural
An additional factor, increasing the likeli- transformations that led Gimbutas to hy-
hood of clines even in the absence of admix- pothesize late-Neolithic migration waves
ture between farmers and hunter-gatherers, could be due to cultural contacts instead,
is the Black Sea. Archaeological evidence and equated the first Indo-Europeans with
(e.g., see Renfrew, 1991) indicates that two the first farmers. The extensive changes in
waves of early farmers dispersed westwards ceramics, architecture, and metallurgy oc-
and northwards from the Near East, with curring in the late Neolithic are then attrib-
the Black Sea separating them (this is why uted to trading and imitation; long-distance
we did not allow movement of individuals migratory movements, if any, may have
through it, but only along its coasts). The been marginal. Although not proved by our
two waves later converged in eastern Eu- simulation, this view is fully compatible
rope, after a period of independent evolu- with it.
tion. If the same allele had been lost, or This study, therefore, agrees with the
fixed, in both groups of farmers, no particu- main views expressed by Menozzi et al.
lar pattern would result; but if founder ef- (19781, Rendine et al. (19861, Piazza (19931,
fects had had opposite consequences in the and Cavalli-Sforza et al. (1993). By contrast,
two groups, the successive admixture would the emphasis laid by the same authors on
initially determine a steep cline, and succes- late Neolithic migrations from the Pontic
sive gene flow would smooth it, resulting in steppes (Cavalli-Sforza et al., 1993) does not
a wide gradient (Endler, 1977). find support in our simulations. Among the
Even under OAG, therefore, a certain role possible causes of this discrepancy, it may be
of admixture is important. But admixture, that Mantel's correlations are not sensitive
under OAG, is between different groups of enough to recognize the effects of minor pro-
farmers, who were geographically separated cesses of gene flow, such as those presum-
in part of their evolutionary history, rather ably occurring in the late Neolithic. Alterna-
than between farmers and hunter-gatherers tively, however, or in addition, one should
of the same area. This interpretation em- consider the possibility that principal com-
phasizes the role both of geographical fac- ponents associated with low eigenvalues re-
tors, such as distance between regions, and flect, at least in part, artificial gradients due
of cultural barriers between sympatric com- to data interpolation. This may be the case
munities of farmers and hunters-gatherers. for areas where population samples are
Indeed, physical barriers are often associ- sparse, such as most of eastern Europe. For
ated with genetic and linguistic change, example, the Caucasus seems to show clinal
even between Indo-European speakers (Bar- variation in the first and third principal
bujani and Sokal, 1990, 1991), although components of Cavalli-Sforza et al. (1993),
other evolutionary mechanisms may also ac- but a detailed genetic study shows that
count for that association (Barbujani, 1991). clines are very uncommon there (Barbujani
et al., 1994).
Genetics and Kurgan waves Our evaluation of the Gimbutas model
Introducing the three migratory waves should be revised if evidence could be pro-
postulated by Gimbutas (GIM model) into vided that the spread of the Kurgan people
the simulation, not only does not increase was accompanied by an increase in popula-
the correlations, but somewhat reduces tion sizes larger than that simulated by us.
them. This means that the current patterns A certain level of ambiguity exists about
of allele frequencies among Indo-Europeans this, as the movement of people from the
can be explained without resorting to the Pontic steppes that Gimbutas (1979) hy-
INDO-EUROPEAN ORIGINS 127
pothesized is called a population expan- rather than limited admixture or founder
sion by Cavalli-Sforza et al. (1993). These effects, leads to clinal variation of gene fre-
authors seem to suggest that, because of the quencies (e.g., see Hedrick, 1986). It is in-
warfare technologies associated with it, triguing to note that the hypothesis of demic
larger populations could be supported in the diffusion from the Near East was initially
regions affected. This aspect remains to be developed to account for clines a t the histo-
explored, and we do not have evidence for or compatibility loci, HLA-A and HLA-B
against this view. However, even if this had (Menozzi et al., 1978; Sokal and Menozzi,
been the case, the increases in population 19821, and that the most significant evi-
sizes prompted by the beginning of food pro- dence for clines spanning Eurasia has been
duction seem to have been much larger than found at the glyoxalase locus (Barbujani,
those associated with new war technologies 19871, which is linked with HLA on chromo-
(see Ammerman and Cavalli-Sforza, 1984). some 6, in a region of extensive linkage dise-
Unless European populations increased dra- quilibrium (Hedrick et al., 1986).
matically in size between 6,000 and 5,000 However, this view, although compatible
years ago, as they did with the arrival of the with the gradients existing a t the HLA and
new farming technologies, we conclude that linked loci, can hardly account for the pat-
the long-distance migrations postulated by terns of variation observed among Indo-Eu-
Gimbutas remain an unnecessary element ropean speakers at other, independently in-
in the evolution of Indo-European-speaking herited, loci. Had the resistance to parasites
populations, as reconstructed from the com- been the main cause of clines in Europe, one
parison of theoretical models and gene-fre- would expect isolation by distance patterns
quency data. at most loci not involved in tissue recogni-
Besides, early farmers expanded into ar- tion, which is not the case (Sokal et al.,
eas of low population density, where few im- 1989a; and this study). On the contrary, the
migrants could substantially modify the ge- nearly parallel gradients observed for many
netic build up of local populations; but this independent alleles suggest that an evolu-
was not the case for late Neolithic groups, tionary pressure affecting the entire genome
who invaded regions already occupied by and not merely part of it, i.e., gene flow,
large farming communities. Simulations of played a major evolutionary role (Slatkin,
genetic processes based on the coalescent 1985,1987).
approach (see Hudson, 1990) show that pat-
terns of genetic variation do not tend to Relation to other work
change much after a demographic expansion Diakonov (1984; cited in Redrew, 1987)
(Harpending, 1994; Rogers and Jorde, listed what he called the essential questions
1995). Successive population movements concerning the origins of Indo-European
can smooth out the gradients and blur some speakers: Who migrated? Why? How many
patterns, but are unlikely to leave a signifi- of them were there? Was it actually a migra-
cant mark on allele frequencies. tion of people, or rather the transfer of a
language from one population to another?
Are parasites responsible for clines? The present study may contribute to an-
Recent evolutionary models (reviewed in swering some of these questions. Our results
Ladle, 1992) indicate that new genotypes show that migrations in the late Neolithic,
entering an area could be resistant to the which have been inferred from changes in
parasites that are already adapted to the the material culture of eastern and central
common resident genotypes. The new geno- Europe (Gimbutas, 19791, are not reflected
types would then increase in frequency, un- in the current genetic structure of Indo-Eu-
til the parasites adapt to them. A selective ropean-speaking populations. Conversely,
mechanism of this type, combined with gene the correlations of observed and simulated
flow, might have been important in deter- data are positive and significant only if we
mining the European clines of allele fre- simulate dispersal of farmers from the Le-
quencies; models may be envisaged whereby vant by demic diffusion. The results of this
a form of frequency-dependent selection, study are, therefore, compatible with the
128 G. BARBUJANI ET AL.

view of identifying proto-Indo-European mote past, and even less so of numbers of


speakers with the first Neolithic farmers migrants. However, even a very small num-
(Renfrew, 1987). ber of dispersing individuals may yield pat-
These findings appear at first glance to terns that correlate with the observed ones,
contradict the results of Sokal et al. (1992) as seen for the OAG model. The results of
who use the same genetic dataset. These au- this study are compatible both with a com-
thors concluded that geographic proximity plete replacement of pre-existing hunter-
explained a substantial amount of the ob- gatherers by Near Eastern farmers (the
served correlation between genetic and Indo- OAG model), and with the more conven-
European linguistic distances. However, tional view that this replacement was only
after allowing for geographic distances, sta- partial, and that hunter-gatherers contrib-
tistically significant partial correlations re- uted to some extent to the genetic pool of
main, which are not explained by distances Indo-European speaking populations (the
describing the origin of agriculture by demic OAC model). But the view whereby lan-
diffusion, Renfrews hypothesis (as de- guage replacement was largely independent
scribed by his postulated transitions subse- of population movements (Zvelebil and
quent to the origin of agriculture), or Gim- Zvelebil, 1988) fails to account for the large-
butas hypothesis. But note that the study scale clinal patterns matching the direction
by Sokal et al. (1992) was based on the spa- of the spread of agriculture observed in Eu-
tial pattern of correlations between genetic rope, and therefore does not seem easy to
and linguistic distances, whereas this study reconcile with the available genetic evi-
examines genetic variation patterns only. dence.
The simulations reported here do not con- In principle, one could also envisage a sce-
sider the patterns of linguistic diversity ob- nario whereby expanding farmers deter-
served in Europe today. Our findings are mined the main genetic characteristics of
therefore compatible as well with a simpler European populations, whereas Indo-Euro-
model in which the observed genetic pat- pean languages spread in a later moment,
terns reflect the process of demic diffusion and mainly by a cultural process. Although
accompanying the origin and spread of agri- this cannot be ruled out, it does not seem the
culture in Europe, but these populations are best explanation available for the current
not the proto-Indo-Europeans. Note that the patterns of genetic and linguistic variation.
new findings provide further support for the The model of Neolithic demic diffusion pro-
demic diffusion hypothesis of Ammerman posed by Renfrew (1991) predicts the exist-
and Cavalli-Sforza (1984). Statistical tests ence of clines in three linguistic groups
of this hypothesis were carried out by Sokal which are supposed to have expanded to-
et al. (1991) using origin-of-agriculture dis- gether with Indo-European. Many such
tances constructed from observed dates of clines have actually been observed among
the onset of the Neolithic. In the present speakers of Altaic and Elamo-Dravidian
study these distances were constructed from languages (Barbujani and Pilastro, 1993;
the simulation results using simple models Barbujani et al., 1994). Moreover, some of
or the spread of farming populations. It is these clines disappear if different linguistic
reassuring that these models yield results in groups are jointly analyzed (Barbujani and
agreement with observed genetic patterns Pilastro, 1993). Although not a proof, these
as had already been noted by Rendine et al. findings suggest that linguistic affiliation is
(1986). the key to deciphering gene-frequency pat-
Our work also offers an answer to Dia- terns in much of Eurasia. In the areas where
konovs second question; presumably, proto- Indo-European, Elamo-Dravidian, and Al-
Indo-European speakers dispersed because taic languages are spoken, linguistic, ge-
their increase in numbers forced them to netic, and archaeological evidence can
look for new suitable land. A study of genetic jointly be accounted for by a demic expan-
variation such as this cannot provide reli- sion from the Near East. Clearly, some lan-
able estimates of population sizes in the re- guages may also have changed by cultural
INDO-EUROPEAN ORIGINS 129
contact (some examples are well docu- among Indo-European speakers. These
mented: Renfrew, 1991); however, the over- models are not mutually exclusive, and it
lap between large clines and linguistic areas may well be that both phenomena were im-
suggests that cultural transmission had a portant, a t different localities. Conversely,
lesser impact than demographic expansions. there is no evidence for a major evolutionary
role of migratory phenomena occurring in
the late Neolithic. These phenomena may
CONCLUSIONS AND FUTURE STUDIES have affected population sizes and allele fre-
The replacement of a food-collectingecon- quencies on a local scale, but the large-scale
omy by one of food production has been a structure of Indo-European speaking popu-
complex process. In the Mediterranean area, lations seems basically to reflect Neolithic
for instance, farming replaced hunting- demic diffusion.
gathering very rapidly in certain regions, New archaeological evidence will cer-
but gradually in other regions where the two tainly be valuable for describing in detail
economies coexisted for centuries (Barker, times and modes of Neolithic expansions, on
1988). However, this does not seem to have which our ideas are certainly simplistic at
deeply affected gene-frequency patterns, the moment. On the genetic side, new allele
since the ATC model, where farmers spread frequency data on previously neglected pop-
at an irregular rate, did not result in a ulations are unlikely to substantially alter
greater correspondence of observed and sim- the picture, since, for most loci, the data sets
ulated genetic distances than OAG and already include hundreds of samples.
OAC. This may mean that the variable rate Rather, collection and analysis of mtDNA
of spread of Neolithic agriculture depended may offer a new perspective on human evo-
on factors in the physical environment. Once lution in Europe, as it has already done for
these factors are incorporated in the model other areas, including Oceania (Stoneking
(OAG or OAC), simulated and observed al- et al., 1990) and the Americas (Ward et al.,
lele frequency patterns resemble each other. 1991,1993; Torroni et al., 1992; Wallace and
At any rate, the models put forward and Torroni, 1992).
tested in our study should doubtless be re-
garded as approximate. Archaeological, lin-
guistic, and genetic studies of individual ACKNOWLEDGMENTS
populations will certainly add details to our This is contribution No. 913 in Ecology
reconstruction of European history; complex and Evolution from the State University of
models of the Indo-European expansion, New York a t Stony Brook. We thank Prof.
such as the one outlined by Sherratt (1988), Marija Gimbutas and Lord Renfrew for their
may be reformulated in such a way as to collegial cooperation in this work. We are
become comparable with genetic data. Nev- indebted to Barbara A. Thomson for techni-
ertheless, this study indicates that not all cal assistance. Jeff Walker computed the
hypotheses on the origins of Indo-Europeans 3-statistics and Donna DiGiovanni word-
account equally well for the available ge- processed the manuscript. Part of the com-
netic evidence. putation was carried out on the Cornell
The hypotheses whereby Indo-Europeans National Supercomputer Facility. This
entered Europe as the first farmers show research was supported by National Science
the best fit. There seems to be no cogent Foundation grant BNS 9117350. This paper
reason to think that the farmers spread was was prepared while Robert R. Sokal was a
due to factors other than their tendency to Fellow at the Center for Advanced Study in
grow in numbers, thanks to the increased the Behavioral Sciences at Stanford, Cali-
resources available. Both incomplete admix- fornia. He is grateful for financial support
ture with hunter-gatherers, and founder ef- provided by the National Science Founda-
fects occurring in the expansion of farmers, tion grant SES-9022192, and by sabbatical
seem to account satisfactorily for the ob- funds from his home institution. Guido Bar-
served patterns of genetic differentiation bujani wishes to acknowledge fruitful dis-
130 G.BARBUJANI ET AL.

cussions with Professors Luca Cavalli- Diakonov IM (1984)On the original home of the speak-
ers~of Indo-European. 23:
Sforza, H~~~ ~ ~ ~ ~R ~~ ~
5-87.
d ~ i ~ Soviet~ ~Anthropol.
~ , Archaeol.
,
Italo Scardovi, and Michael Turelli. Easteal S (1988)Range expansion and its genetic conse-
quences in populations of the giant toad, Bufo rnari-
nus. Evol. Biol. 23:4%84.
LITERATURE CITED
Eisen MM (1979)Mathematical Models in Cell Biology
Ammerman AJ, and Cavdli-Sforza LL (1971)Measur- and cancer Chemotherapy,~ ~ springer,
~ l i ~ :
ing the rate Of spread Of in Europe' Man Endler JA (1977)Geographic Variation, Speciation, and
6:674-688. Clines. Princeton, New Jersey: Princeton University
Ammerman AJ, and Cavalli-Sforza LL (1984)The Neo- press.
lithic Transition and the Genetics of Populations in Feller (1940)On the logistic law of growth and its
Europe. Princeton, New Jersey: Princeton University empirical verifications in biology, Ada Biotheor, 5:
Press. 51-66.
Barbujani G (1987)Diversity of Some gene frequencies Gimbutas M (1979)The three of KurKan people
in European and Asian populations. 111. Spatial corre- into Old Europe, 4500-2500 B,C, Arch, Suisses An-
logram analysis. Ann. Hum. Genet. 51:345-353. thropol. Gen. 43~113-137.
BarbuJani (1991)What do languages us about
Gimbutas M (1986)Remarks on the ethnogenesis of the
human microevolution? Trends Ecol. Evol. 6:151-156. Indo-Europeans in Europe. In Bernhard and A
Barbujani G, Jacquez GM, and Ligi L (1990)Diversity of Kandler-P&son (eds.): Ethnogenese Europaischer
Somegene frequencies in European and Volker, Stuttgart: Gustav Fischer Verlag, pp. 5-20.
tions. V. Steep multilocus clines. Am. J. Hum. Genet. Guglielmino CR, Piazza A, Menozzi p, and Cavalli-
47:a67-875. Sforza LL (1990)Uralic genes in Europe. Am. J . Phys.
Barbujani G, Nasidze IS, and Whitehead GN (1994)Ge- 83:57-68.
netic diversity in the Caucasus. Hum. Biol. 66:639-
Harding RM, and Sokal RR (1988)Classification of the
668.
European language families by genetic distance. Proc.
Barbujani G, and Pilastro A (1993)Genetic evidence on Natl, Acad. sci, u, s,A. 85,.937c9372,
origin and dispersal of human populations speaking
languages of the Nostratic macrofamily, pro,., Natl, Harpending HC (1994)Signature Of ancient population
Acad. Sci. U. S. A. 90:4670-4673. growth in a low resolution mitochondria1 DNA mis-
match distribution. Hum. Biol. 66:591-600.
Barbujani G, and Sokal RR (1990)Zones of sharp ge-
netic change in Europe are also language boundaries, HaMan FA (1973)On the mechanisms Of population
Proc. Natl. Acad. Sci. U. S. A. 87:1816-1819. growth during the Neolithic. Curr. Anthropol. 14:
535-543.
Barbujani G, and Sokal RR (1991)Genetic population
structure of Italy. XI. Physical and cultural barriers to FA (lg81) New
gene flow. Am. J . Hum. Genet. 48:398-411. York: Academic Press.
Barker G (1985)Prehistoric Farming in Europe. Cam- Hedrick pw (1986)Genetic P b o V h i s m in heteroge-
bridge: Cambridge University Press. neous environments: A decade later. Annu. Rev. Ecol.
Barker G (1988)Comment on "Archaeology and Lan- 'yst. 17t535-566.
guage," by C. Renfrew. Cum. h t h r o p o l . 29:44H49. Hedrick PW, Thomson G, and Klitz W (1986)Evolution-
Bertranpetit J, and Cavalli-Sforza LL (1991)A genetic ary genetics: HJA as an system. In s Kar-
reconstruction of the history of the population of the lin and E Nevo (eds.): Evolutionary Processes and
Iberian peninsula. Ann. Hum. Genet. 5551-67. Theory. Orlando, Florida: Academic Press, pp. 503-
Boileau MG, Hebert PDN, and Schwartz SS (1992)Non- '06.
equilibrium gene frequency divergence: Persistent Hudson RR (1990)Gene genealogies and the coalescent
founder effects in natural populations. J. Evol. Biol. Process. In FutuWa and J AntonovicS (eds.): OX-
5:25-39. ford Surveys in Evolutionary Biology, Vol. 7.Oxford,
Cavalli-Sforza LL (1988)The Basque population and U K Oxford University Press, pp.
ancient migrations in Europe. Munibe 6:12%137. Jorde LB (1980)The genetic structure of subdivided hu-
Cavalli-Sforza LL, Menozzi p , and Piazza A (1993) man Populations. A review. In JH Mieke and MH
~~~i~ and human evolution, science 259: Crawford (eds.): Current Developments in Anthropo-
639446. logical Genetics, Theory and Methods. New York: Ple-
Cavalli-Sforza LL, Minch E, and Mountain J L (1992) numt pp' 135-208'
Coevolution of genes and languages revisited, P ~ c . Kaiser M, and Shevoroshkin V (1988)Nostratic. Annu.
Natl. Acad. Sci. U. S. A. 89:5620-5624. Rev, Anthropol. 17:30%329.
Cavalli-Sforza LL, and Piazza A (1993)Human genomic Keyfitz N (1977)Introduction to the Mathematics of
diversity in Europe: A summaw of recent research Populations with Revisions. Reading, UK: Addison-
and prospects for the future. Eur. J . Hum. Genet. Wesley.
l:3-18. Kimura M, and Weiss GH (1964)The stepping stone
Cavalli-Sforza LL, Piazza A, Menozzi P, and Mountain J model of Population structure and the decrease of ge-
(1988)Reconstruction of human evolution: Bringing netic correlation with distance. Genetics 49561-576.
together genetic, archaeological and linguistic data. Ladle RJ (1992)Parasites and sex: Catching the red
Proc. Natl. Acad. Sci. U. S. A. 85:6002-6006. queen. Trends Ecol. Evol. 7:405408.
INDO-EUROPEAN ORIGINS 131
Lewontin RC (1972)The apportionment of human diver- Sokal RR (1988) Genetic, geographic, and linguistic dis-
sity. Evol. Biol. 6:381-398. tances in Europe. Proc. Natl. Acad. Sci. U. S. A. 85:
Manly BFJ (1991) Randomization and Monte Carlo 1722-1726.
Methods in Biology. London: Chapman and Hall. Sokal RR (1991) Ancient movement patterns determine
Mantel N (1967) The detection of disease clustering and modern genetic variances in Europe. Hum. Biol. 63:
a generalized regression approach. Cancer Res. 27: 589-606.
209-220. Sokal RR, Harding RM, and Oden NL (1989a) Spatial
Menozzi P, Piazza A, and Cavalli-Sfona LL (1978) Syn- patterns of human gene frequencies in Europe. Am. J .
thetic maps of human gene frequencies in Europeans. Phys. Anthropol. 80:267-294.
Science 201:786-792. Sokal RR, Jacquez GM, Oden NL, DiGiovanni D, Fal-
Mourant AE, Kopec AC, and Domaniewska-Sobczak K setti AB, McGee E, and Thomson BA (1993) Genetic
(1976) The Distribution of the Human Blood Groups relationships of European populations reflect their
and Other Polymorphisms. Oxford Oxford University ethnohistorical affinities. Am. J. Phys. Anthropol. 91:
Press. 55-70.
Nei M (1987) Molecular Evolutionary Genetics. New Sokal RR, and Menozzi P (1982) Spatial autocorrelation
York Columbia University Press. of HLA frequencies in Europe support demic diffusion
of early farmers. Am. Nat. 119:l-17.
Piazza A (1993) Who are the Europeans? Science 260:
1767-1769. Sokal RR, Oden NL, Legendre P, Fortin M-J, Kim J,
Thomson BA, Vaudor A, Harding RM, and Barbujani
Piazza A, Cappello N, Olivetti E, and Rendine S (1988) G (1990) Genetics and language in European popula-
The Basques in Europe: A genetic analysis. Munibe tions. Am. Nat. 135:157-175.
6:168-176.
Sokal RR, Oden NL, Legendre P, Fortin M-J, Kim J , and
Press WH, Flannery BF, Teukolsky SA, and Vetterling Vaudor A (1989b) Genetic differences among lan-
WT (1986) Numerical Recipes. Cambridge: Cam- guage families in Europe. Am. J . Phys. Anthropol.
bridge University Press. 79:489-502.
Prevosti A, Ocana J , and Alonso G (1975) Distances Sokal RR, Oden NL, and Thomson BA (1988) Genetic
between populations of Drosophila suboscura based changes across language boundaries in Europe. Am.
on chromosome arrangement frequencies. Theor. J. Phys. Anthropol. 76:337-361.
Appl. Genet. 45:231-241. Sokal RR, Oden NL,and Thomson BA (1992) Origins of
Rendine S, Piazza A, and Cavalli-Sforza LL (1986) Sim- the Indo-Europeans: Genetic evidence. Proc. Natl.
ulation and separation by principal components of Acad. Sci. U. S. A. 89:7669-7673.
multiple demic expansions in Europe. Am. Nat. 128: Sokal RR, Oden NL, and Wilson C (1991) Genetic evi-
681-706. dence for the spread of agriculture in Europe by demic
Renfrew C (1987) Archaeology and Language. London: diffusion. Nature 351:143-145.
Jonathan Cape. Sokal RR, and Rohlf FJ (1995) Biometry, 3rd ed. New
Renfrew C (1989) Models of change in language and
archaeology. Trans. Philol. SOC.87t103-155. Bhatia K, and Wilson AC (1990)
Renfrew C (1991) Before Babel: Speculations on the ori- Geographic variation in human mitochondrial DNA
gins of linguistic diversity. Cambridge Arch. J . 1 : s from Papua New Guinea. Genetics 124:717-733.
23. Torroni A, Schurr TG, Yang C-C, Szathmary EJE,
Renfrew C (1992) Archaeology, genetics and linguistic Williams RC, Schanfield MS, Troup GA, Knowler WC,
diversity. Man N.S. 27:445478. Lawrence DN, Weiss KM, and Wallace DC (1992) Na-
Rogers AR, and Jorde LB (1987) The effect of non-ran- tive American mitochondrial DNA analysis indicates
dom migration on genetic differences between popula- that the American and NaDene populations were
tions. Ann. Hum. Genet. 51:169-176. founded by two independent migrations. Genetics
Rogers AR, and Jorde LB (1995) Genetic evidence on 130r153-162.
modern human origins. Hum. Biol., in press. Wade MJ, and McCauley DE (1988) Extinction and re-
Ruhlen M (1987)A Guide to the Worlds Languages. Vol. colonization: Their effect on the genetic differentia-
1: Classification. London: Edward Arnold. tion of populations. Evolution 42:995-1005.
Sgaramella-Zonta L, and Cavalli-Sforza LL (1973) A Wallace DC, and Torroni A (1992) American Indian pre-
method for the detection of a demic cline. In NE Mor- history as written in the mitochondrial DNA A re-
ton (ed.): Genetic Structure of Populations. Honolulu: view. Hum. Biol. 64t403-416.
University of Hawaii Press, pp. 12%135. Ward RH, Frazier BL, Dew-Jager K, and Paabo S (1991)
Extensive mitochondrial diversity within a single Am-
Sherratt A (1988) Comment on -Archaeology and Lan-
erindian tribe. Proc. Natl. Acad. Sci. U. S. A. 88:872&
guage, by C. Renfrew. Curr. Anthropol. 29:45-63.
8724.
Slatkin M (1985) Gene flow in natural populations.
Ward RH, Redd A, Valencia D, Frazier BL, and Paabo S
Annu. Rev. Ecol. Syst. 16:393430. (1993) Genetic and linguistic differentiation in the
Slatkin M (1987) Gene flow and the geographic struc- Americas. Proc. Natl. Acad. Sci. U. S. A. 90: 1066%
ture of natural populations. Science 236:787-792. 10667.
Smouse PE, Long JC, and Sokal RR (1986) Multiple Wijsman EM, and Cavalli-Sforza LL (1984) Migration
regression and correlation extensions of the Mantel and genetic population structure with special refer-
test of matrix correspondence. Syst. Zool. 35:627432. ence to humans. Annu. Rev. Ecol. Syst. 15:279-301.
132 G. BARBUJANI ET AL.

Wright S (1969) Evolution and the Genetics of Popula- world since the Neolithicum as indicated by its geno-
tions. Vol. 2. The Theory of Gene Frequencies. Chi- type for hybrid necrosis. J. dAgric. Trad. Bota. Appl.
cago: University of Chicago Press. 27:25-53.
Wright S (1978) Evolution and the Genetics of Popula- Zvelebil M, and Zvelebil KV (1988) Agricultural transi-
tions. Vol. 4. Variability within and among Natural tion and Indo-European dispersal. Antiquity 62574-
Populations. Chicago: University of Chicago Press. 583.
Zeven AC (1980) The spread of bread wheat over the old

Das könnte Ihnen auch gefallen