Beruflich Dokumente
Kultur Dokumente
Ancestral Origins
New Software Looks Even Further Into the Past
Many preconceptions about ethnicities in a multi-ethnic, or “admixed” identity, skin color, is more closely related
individuals start with the assumption that a individual was conceived as part of to geographic distance from the equator
person’s ethnicity can be judged by simply Dr. Batzoglou’s consulting work with than shared ancestry. Racial groups are, in
looking at his or her physical traits. Eyes, 23andme, a Bay Area genetic analysis truth, extremely similar at the genetic level.
skin, even hair all yield clues about the company. Dr Batzaglou explains: “The Research has shown that, on average, only
origin of a person’s family. But in the lab of problem of determining the ancestral 0.1% of DNA will differ between any two
Dr. Serafim Batzoglou, assistant professor population for each location of an individuals. Of this 0.1%, only 10-15% varies
in the Department of Computer Science, individual’s chromosomes appealed to between groups of different ethnic origin;
ancestral analysis goes much deeper. Dr. us as scientifically interesting, and one in most of the remaining genetic diversity
Batzoglou and graduate students Eugene which we could contribute computational contributes to variety within a population
Fratkin, Andreas Sundquist and Chuong Do methods that are substantial improvements of shared ethnic origin. Included in that 10%
have developed software that can pinpoint over the state of the art.” are genes containing small mutations that
the ethnicity of ancestors from ten or even accumulate over generations.
twenty generations ago by looking at The new software was named “HAPAA,”
the historical information buried in their a play on the term “Hapa” used in Hawaii A Haploblock Mosaic
descendant’s DNA. How was this level of to describe a person of mixed ethnic HAPAA was designed to analyze patterns
precision achieved, and what can it tell us origin. Race and ethnicity have been used of change in sequences of DNA from
about our ancestry? throughout history as defining, if not different populations over long periods of
divisive, characteristics; conflicts between time. Random single nucleotide mutations,
Clues Hidden In the ethnic groups have triggered many of the called “SNPs,” appear frequently, scattered
DNA worst battles in human history. But the
concept of racial identity is far from simple.
throughout human chromosomes. If the
The idea of developing software capable mutations are not harmful, they may escape
One of the most visible clues to racial the ravages of natural selection and be
of identifying the contribution of different
volume VII 39
ENG + TECH written by NIKKI
RORY BREAUX
SAYRES
“linkage disequilibrium” between the SNPs his lab had to train HAPAA to identify the
in each haploblock. Linkage disequilibrium fragments of the ancestral haploblock
(LD) refers to the likelihood of certain and retrace the lineage that produced the
genetic combination being found in the current admixture.
same individual. For example, consider two
genes A and B, which exist as versions A As a starting point for the development of
and A’, as well as B and B’. If the two genes the HAPAA program, Dr Batzoglou drew
sorted independently during mating, then upon the DNA sequences found in HapMap,
an individual with A or A’ will have the a database of genetic variants found in
same probability of having B or B’; genes different individuals. 210 DNA sequences,
A and B would therefore be considered to representing nearly unrelated individuals of
be in “linkage equilibrium.” However, genes North Western European, West African and
often sort in linkage disequilibrium, with East Asian ancestry, were drawn from this
a disproportionate number of individuals database and used as the “training set” for
Triangular representation of the inheriting a specific combination, for HAPAA. A smaller group of these individuals
average West African, European and example A with B, or A’ with B’. was withheld for testing of the finalized
Indigenous American admixture pro- program.
portions of different populations. Other genome software take advantage of
linkage disequilibrium, but HAPAA takes the In order to build greater predictive power
analysis several steps further. According to into HAPAA, graduate students simulated
Dr. Batzaglou, “previous methods could only successive mating over 20 generations
take into account LD between neighboring of individuals drawn at random from the
40 stanford scientific
layout by RORY SAYRES
Credit: sxc.com
training set. The model assumed that, as distinguish contributions from an was
in real life, there was a small degree of ancestor 20 generations prior to his or her very
recombination and genetic drift (a random descendant, its predictive power is limited common,
skew in gene frequencies that happens to the ethnic groups used in the training especially
over time). By knowing the ancestries of set. Dr. Batzaglou would like to see further among the
every fragment in the mosaic pattern of the improvements, “making it more effective on African-American and
resultant haploblocks in each generation, fine-grained distinctions between highly Hispanic populations. Among
HAPAA could be trained to infer the patterns similar populations” such that the program the average subject who was
of inheritance across multiple generations. could distinguish between as many as the self-described as “white,” there was a 0.7%
50 different populations on Earth. He can African Ancestry, while the average African
After many rounds of training, HAPAA was also foresee wide-reaching applications American had 15-18% European genetic
ready for its final testing. Dr Batzoglou and of the software. As he explains, “imagine ancestry. A different team of scientists,
his students created simulated individuals several thousand individuals from different led by Cerda- Flores of the Universidad
from the test set withheld from training. neighboring populations having their Nacional Autónoma de México, found that
These individuals were the result of multiple ancestry painted in their chromosomes the average Hispanic American had 3-8%
rounds of simulated mating and each had using HAPAA. The statistics of the ancestral African ancestry. These estimates, however,
only a single ancestor from one ethnic blocks could tell us about the history, were only able to detect the ethnicity of
group, while all other ancestors were migration and admixture patterns of the the previous seven generations, which
drawn from a different ethnic group. The nearby populations in the past several strongly reflected the forced migrations of
challenge was for HAPAA to identify the hundred years.” slave populations to the Americas. HAPAA
location of the contributions of the two has the power to look at trace ancestral
different ethnicities. The results of these Admixed America history over 20 generations, before America
tests indicated that HAPAA could determine How closely does the American population was colonized by Europeans.
with high accuracy the contribution of a resemble these ancestral populations? This
single ancestor within the previous ten question has been posed and answered
generations. The prediction was more through several lines of research. In
accurate when the two populations sampled one study led by Dr. Shriver, Associate
were assumed to be highly divergent. The Professor of Anthropology and Genetics
more genetically similar the two populations at Pennsylvania State University, a sample
were, the more difficult it became to detect population in Chicago was analyzed using
the contribution of the single ancestor. an older genetic analysis method. Dr.
However, in populations separated by a Shriver’s results determined that admixture
large amount of time, the contribution of a
20th generation ancestor could be detected.
Nikki Breaux is a PhD student researching cancer biology. In her free time, she enjoys
Mapping Human reading, cycling and enjoying Califonia’s weather.
Migrations
The Batzaglou lab is currently working To Learn More
to make HAPAA available to the general For more information, visit Dr. Batzaglou’s departmental website at http://ai.stanford.
public. Although HAPAA can currently edu/~serafim/. Additional literature and discussion about race and genetics can be
found at http://raceandgenomics.ssrc.org/.
volume VII 41