Sie sind auf Seite 1von 8

Molecular evolution Tests of neutrality

Guest lecture: Dr Dan Neafsey (Group leader, Malaria Genome Sequencing and Analysis,
Broad Institute)
Scribe notes: Shannon Tunney 15, Lloyd Mccarthy 15, Meng Lai Nicole Wong 14,
Devanshi Patel, 13,
Neutrality: the difference between populations is selectively neutral (i.e. no selection)
Topics:
1. Understanding neutral variation in populations
2. Model-based tests of selection
3. Empirical tests of selection
Understanding neutral variation in populations
Population Genetics: the study of the frequency and interaction of alleles and
genes in populations
Population Genomics: large-scale comparison of DNA sequences of populations
o 1960s theoretical, didnt have very much information
o Present day sequencing is now easy and affordable, leading to huge data
sets and the ability to use genetic variations clinically, useful for health
services and bioengineering
o There are more than 1200 human genome-wide association studies on over
200 traits and diseases involving 100,000s of subjects data has made a
big difference
Mutation + Selection=Evolution
o Objective of Tests: What is the relative contribution of each for maintaining
variation in a population?
H0: Variation profile is neutral
H1: Variation profile is not neutral, possibly caused by selection
Early Critic of Darwin, Fleeming Jenkin:
o Blending inheritance - widespread hypothetical model (at the time) which
theorized that the traits from each parent would blend and yield offspring
with an intermediate of the parents traits
Problem with theory: eventually, everything would be blended and
there would be no variation in offspring. Natural selection does not
work with this model.
o Gemmules: packets of substance in an individual that can blend with other
packets in offspring
o
o Blending inheritance
Gemmules: packets of substance in an individual that can blend with other packets in
offspring
Eventually everything would be blended and there would be no variation selection does
not work in this model

Mendelian inheritance
o Law of Segregation- states the following ideas: allelic variation, offspring
receive 1 allele from each parent, dominance/recessivity, parental alleles
segregate to form gametes

o Law of Independent Assortment- also known as "Inheritance Law", states that


separate genes for separate traits are passed independently of one another
from parents to offspring. (Not broadly true for real genes because of linkage)
o Garden peas exhibit a clean phenotype. Conversely, evening primrose is a
bad model to study because its traits are very variable (none of its traits have
clean Mendelian inheritance).
Hardy-Weinberg Principle
o Simple case, no selection
o Definition: allele and genotype frequencies in a population will remain
constant from generation to generation given the following evolutionary
requirements
infinite population size
random mating
non-overlapping generations
no selection, mutation, or migration
o Hardy-Weinberg Law:
For a single locus with two alleles denoted A and a with frequencies
f(A) = p and f(a) = q, the expected genotype frequencies are f(AA) =
p^2 for the AA homozygotes, f(aa) = q^2 for the aa homozygotes,
and f(Aa) = 2pq for the heterozygotes. Can assume random mating
pairs and get new genotype frequencies for each successive
generation.
p + q=1
p2 + 2pq + q2 = 1
Whole system characterized by just one parameter, p.
Over time, allele frequencies dont change (though phenotype
frequencies do)
Deviation from expectations indicates failure of 1 or more assumptions
possible selection
Hardy-Weinberg Example
o Sickle cell anemia (a blood disease) which can protect against malaria
o SS = normal red blood cell
o ss = sickle cell, deadly mutation
o Theres more heterozygous (Ss) than we expect to see. Why? Malaria
selection pressure
This is an example of how we use deviations to see a pattern
(importance of neutrality)
Or, deviations from the Hardy-Weinberg Law could indicate wrong
assumptions in data

Model-based tests of selection


Approach: detect selection by starting with the assumption that variation is
neutralthrough comparison to neutral expectation
Neutral theory of molecular evolution: Kimura (1924-1994), Changed model to
deal with finite populations where stochastic processes matter, such as genetic
drift.
o Adapted diffusion approximations to alleles within populations

Under diffusion, probability of an allele whose frequency is between x


and x+dx, where
, 4 = proportional to population
chromosomes (would be 2 for haploids), N = population size, =
mutation rate, x = frequency of allele

For neutral variation, or Hardy-Weinberg, there is no genetic drift between


generations. For finite populations, there is drift between generations. This is
a realistic.
o Probability of seeing an allele at a given frequency interval
If mutation rate is high, we expect to see more variation
o How is drift affected by N (population size)? Inversely related.
When N is large, its similar to Hardy-Weinberg
When N is small, we see variability
Effective human population size = Effective genetic population size
Our population size is growing faster than our mutations can
occur drift is important for us!
o Connecting diversity and malaria transmission
Reduced transmission yields reduced parasite population size
reduced parasite population size yields enhanced genetic drift
Enhanced genetic drift reduces diversity
Smaller parasite population size increased genetic drift
decreased diversity
o SNP polymorphism data indicates shrinking parasite effective population size
in Senegal theres higher variation
o Parasite population shrinking after introduction of new drug
Ewens sampling formula (1972)
o Ewens used diffusion theory to find the expected allele frequency of a sample
population. Also, the formula extended the idea of identity by descent (ibd),
is sample-based, and shifted focus to inferential methods. Concrete prediction
of polymorphisms in population.
o

Population size & drift are inversely related


Genetic drift causes a loss of variation (lose alleles from generation to
generation). Mutations restore/inject variation.

Probability that a sample of n gene copies contains k alleles and that there
are a1, a2, , an alleles represented 1,2, ,n times in the sample:

where
and aj is the number of alleles
found in j copies
This is the probability of observing the profile of polymorphism based
on variability
o If you have a lot of neutral selection, the most genetic variation is singleton
variation
o Example: Malaria is AT-rich (81%)
Categorize mutations into ATCG or CGAT
Which category is driven more quickly to be fixed? CGATATCG
Selection is pushing back the amount of AT in the genomeit back
AT mutations unlikely to be fixed
Coalescence
o Alternate, backwards approach to generating expected allele frequency
distributions
o Attempts to trace all alleles of a gene shared by all members of a population
to a single ancestral copy, known as the most recent common ancestor
o Infer tree structure (genealogy), because tree structure dictates pattern of
polymorphism in data. Common ancestor prediction is 4xN generations (i.e.
time to coalescence of a population is 4N generations)
Studies how far back in time a sample shared a common ancestor
o

Coalescent inference parametrize distribution of time

P(coalescence)=1/(2N)
=mutation rate
G is a genealogy
The probability of seeing only 3 SNP difference is really small because
we expect to see more than 3
Coalescence tells you inferences about selection

Can only coalesce once, and this can only happen after k mutations
Bigger population greater chance diverse from each other
Small population not many opportunities for divergence, ancestors
close
Turning neutral models into tests of neutrality
3 polymorphism summary statistics
S=number of segregating sites in sample

=average number of pairwise differences

i =number of sites that divide the sample into i and n-i


sequences

=4 N use this to come up with expectations for the other

variables
Frequency-based neutrality tests (Tajima)

o
o

D is the difference over normalizing over difference


Negative D big excess of singletons theres directional selection in
the population
NOT the same as selective sweep
E.g. population expansion
Positive D balancing selection lots of opportunity to get mutations
in 50% of the population
NOT the same as balancing selection!
E.g. Population structure or subdivision
Evolution within a population should show divergence
Neutral Expectation

D=0, no selection, no migration, constant population size, mostly


singletons, some doubletons and tripletons and quadtons.
Positive Selection (Sweep)

Negative D, Ex. When go back, one gets malaria drug huge fitness
advantage, pushes up frequency of singletons. Example of positive
selection.
Balancing Selection

Positive D, Lineages go way back before common ancestor, when one


becomes more common/fit, then soon become less fit, 2 clear distinct

branches. Caused by balancing selection (e.g. malarial proteins varying


to avoid antigen recognition)
Population Structure/Subdivision

Positive D, Ex. Could be mosquitos on 2 sides of a river, sampled 2


populations inadvertently, diverged way back, but actually 2
populations not one
Population Expansion

Negative D, Ex. could be huge bottleneck effect, or huge expansion of


population

Polymorphism vs. Divergence: Divergence between species should approximate


variation within species
o Polymorphism variability within a species
o Divergence differences between species A and species B
HKA Test: One of first tests to be based on sequencing data. Looked at profile of
polymorphism in Drosophilia, what saw in species/between species. You would
expect to see similar within/between, so similar ratios. But more vs. less depending
on location in flanking region. due to enzyme, ADH differently optimized and
shows elevated diversity (excessive polymorphism)
Site classes
o Synonymous mutations: dont affect amino acid
o Non-synonymous mutations: changes amino acid
MK Test: another selection test, again with ADH. Syn/Non-syn mutations in an
organism for fixed differences/ polymorphisms, Again ADH shows excessive non-syn
fixed differnces

Rate-based selection metric:


o dN = no. nonsynonymous changes/ no. nonsynonymous sites
o dS = no. synonymous changes/ no. synonymous sites
o dN/dS < 1
purifying selection, nonsyn not occurring as often as expected so
selection is not occurring
o dN/dS = 1
neutral expectation, selection not being driven anywhere
o dN/dS > 1
positive selection, fixing this selection faster than occurring
o Correlations with dN/dS (or just dN)
Positive
Negative
dispensability
expression level
gene length
protein abundance
codon bias
number of proteinprotein
interactions
centrality in
interaction
network

Empirical Tests of Selection


We now have a lot of sequencing data to use
o Look for patterns based on what the genes are doing instead of making
models with certain assumptions
o Used to define test expectations
Selective Sweep
o After selective sweeps, recombination doesnt have time and wipes out
diversity and boost linkage disequilibrium.
o Ratio of frequency of haplotype
Future/Longitudinal data: Earlier selection tests based on one set/one slice in
time, grabbing data from one population or sequence. But now, with cheap
sequencing and lots of time, we can cover lots of evolutionary time. If we collect
100 every year for 10 years, we can sequence through time and can watch natural
selection as it happens.
Issues with Modern Data Sets
Most selection tests assume contemporaneous sampling (all samples from same
generation)
Large Malarial data sets may span many generations

Das könnte Ihnen auch gefallen