Guide to Molecular Genetics Course ZOO3649

Course Guide Molecular Genetics (Evolutionary Genetics) ZOO3649
Compiled by course coordinator:
Professor Yoshan Moodley
Department of Zoology
Email: yoshan.moodley@univen.ac.at
Office FF-047, Life Sciences and Chemistry Building.
Course assistant: Khomotso Nkadimeng*
Email: khomotsonkadimeng38@gmail.com
Office FF-047, Life Sciences and Chemistry Building.
*Please direct correspondence for this module to the course assistant.
1 Course Guide – Molecular Genetics – ZOO3649 – 2015 – University of Venda

COURSE PREREQUISITES
The prerequisite to attend this course is a pass in ZOO2544.
ZOO2648 is not a prerequisite, but if you attended it you will find it very useful.
READING
1. Bergstrom and Dugatkin Evolution (1st Edition)

2. Hartl and Jones Genetics – Principles and Analysis (4th Edition)
*PDF versions of both books are available from the course assistant
ASSESSMENT
The total of this module is 28 credits. This module will be assessed as follows:
Semester Mark (60%)
2x Assignments – to be given during the practical sessions for completion during the
week. Deadlines for handing in assignments are exactly one week. Each assignment is
worth 10% of your final mark, so please take them seriously.
1x Literature review and presentation. Each student will be given a different scientific
paper, and they will have to present the paper using a Powerpoint presentation the
following week. This will count for 10% of your final mark.
3x Tests – These will also be given during the practical sessions. Each one is worth 10%
of your final mark.
Exam Mark: 40%
This exam will cover material from the entire module.
TOTAL FINAL MARK: 100% = Semester Mark + Exam Mark

**ATTENDANCE**
Students are required to attend all lectures and practical sessions. Students must have a
minimum of 70% attendance to be able to write the final exam. You will be required to sign a
class attendance sheet at each practical session.
TABLETS
You MUST bring your university tablets, fully charged, to all lectures and practical sessions.
Software to be used in the practical sessions (see practical timetable) must be obtained from
the course assistant and installed on your tablets before the relevant practical session.

CENTRAL THEME
In 1972, Theodosius Dobzhansky
“Nothing in biology makes sense, except in the light of evolution”.
Evolution, therefore, will be the central theme of this module and understanding it will be
essential to achieving the module outcomes below.
Module Structure
First you will be given a brief introductory lecture to evolutionary genetics and evolutionary
thinking, which will help you to deal with the ideas and concepts that you will encounter
throughout the module. The first section “The Central Dogma” will teach you about the basic
molecular genetics of DNA and RNA, the material upon which all life is based. After this, in the
section “Recombinant DNA Technology”, we will explore the key developments arising from
this molecular foundation that have revolutionised our understanding of genetics and allowed
us to apply genetic tools in research. Then you will discover how molecular mechanisms can
generate the variation of life we see around us in the section “Genetic Variation”. You will then
learn about the “Forces of Evolution” that act upon genetic variation, and the main historical
foundations of modern genetics. In the section “Genetic Structure”, you will learn how these
forces of evolution have shaped and partitioned genetic variation into populations and species.
Lastly, in the section “Molecular Ecology”, you will find out how all the information you have
gained thus far is put into practice in the real world of research.
Study Section Title Lectures Practicals

1 The Central Dogma 7 3.5
2 Recombinant DNA technology 6 4.5
3 Genetic Variation 9 6
4 The Forces of Evolution 13 5
5 Genetic Structure 9 3
6 Molecular Ecology 7 4
There will be approximately 51 lectures and 27 Practical sessions. Tests and presentations will
be included within the practical sessions. That makes a total of approximately 132 teaching
hours.

MODULE OUTCOMES
After this module, students should be able to:
1. Synthesize and incorporate the fundamentals of molecular genetics with the variation
of the natural world around us.
2. Define and describe theoretical and historical foundations of evolutionary genetics
3. Identify, describe, distinguish, compare and analyze mechanisms and fundamental
factors (mutation, genetic drift, selection, migration) and their interactions that create
diversification within and between populations and lead to evolutionary change
4. Use empirical methods and tools to describe levels and patterns of genetic diversity and
differentiation in populations and to infer and assess population and evolutionary
genetic structure.
5. Students will be able to reflect before and after the course about how it impacts their
future practice in terms of their ability to carry out post-graduate research.
TIMETABLE
Lectures will be 50 minutes long and will be held four times a week at 11.00 on Mondays,
Tuesdays and Thursdays in FF-017 (Life Sciences and Chemistry Building). On Wednesdays the
lecture will be at 13.00 in the same venue.
Practical sessions will be three hours long and will be held twice a week from 14.00-17.00 on
Tuesdays and Thursdays. Venue to be announced.
Assignments will be handed out and collected during the practical sessions
Tests will be given during the practical sessions
It is really in your best interest to come to all the practicals and lectures.

Semester Week Monday Tuesday Wednesday Thursday
Week 2 NO LECTURE NO LECTURE NO LECTURE INTRODUCTION

19 July - 25 July
Week 3 Start Section 1 Practical: Practical

26 July – 1 August No Practical Mol Biol Prac
Week 4 Practical Start Section 2

2 August – 8 August Transcription Practical
ProteinSynthesis
Week 5 NO LECTURE Practical Practical
9 August – 15 August Cloning DNA sequencing
Library making
Week 6 Practical: Start Section 3 Practical:
16 August – 22 August DNA sequencing TEST
Week 7 Assignment Due

23 August – 29 August Practical Practical:
PCR Primer Hands on
design Mutation
Week 8 Practical: Start Section 4
30 August – 5 September Mutation Practical
Simulation Haploid
Variation
Week 9 Practical Practical:
6 September – 12 September Diploid TEST
Variation
Week 10 Assignment Due Practical
13 September – 19 September Practical Mendelian
Paternity Genetics
Week 11 NO LECTURE NO LECTURE NO LECTURE NO LECTURE
20 September – 26 September
Week 12 Practical Start Section 5
27 September - 3 October Hardy Weinberg Practical
Drift Migration
Selection
Week 13 Practical NO LECTURE Practical
4 October – 10 October TEST Reading and
Making trees
Week 14 Practical Practical:
11 October – 17 October Presentations Presentations
Week 15 NO LECTURE NO LECTURE NO LECTURE NO LECTURE

18 October – 24 October Practical Practical
Revision Revision
Week 16 Practical Practical
25 October – 31 October Making networks AMOVA

Contents
Section 1: The Central Dogma....................................................................................................................... 9
The Discovery of DNA .................................................................................................................................10
Genome Organisation .................................................................................................................................14
DNA replication ...........................................................................................................................................19
Transcription ...............................................................................................................................................25
RNA Processing ...........................................................................................................................................32
Translation (Protein synthesis) ...................................................................................................................35
DNA repair...................................................................................................................................................39
Section 2: Recombinant DNA Technology ..................................................................................................46
Recombinant DNA.......................................................................................................................................47
Making a DNA Library .................................................................................................................................53
Identification of DNA ..................................................................................................................................54
Visualising Recombinant DNA .....................................................................................................................57
PCR ..............................................................................................................................................................61
Next generation sequencing .......................................................................................................................63
Section 3: Variation .....................................................................................................................................65
Mutation I: Point mutations .......................................................................................................................66
Mutation II: Rearrangement mutations......................................................................................................69
The effects of mutations .............................................................................................................................72
Recombination ............................................................................................................................................76
Other mechanisms that generate Variation ...............................................................................................79
Genetic Markers I........................................................................................................................................83
Genetic Markers II.......................................................................................................................................86
Genetic Markers III......................................................................................................................................87
Measuring Genetic Variation ......................................................................................................................90
Section 4: The Forces of Evolution ..............................................................................................................93
An evolutionary way of thinking .................................................................................................................95
Natural Selection.........................................................................................................................................97
Evidence for Natural Selection....................................................................................................................99
Sexual selection.........................................................................................................................................104
Inheritance ................................................................................................................................................106
Mendelian Genetics ..................................................................................................................................109
Deviations from Mendelism......................................................................................................................112
The Modern Synthesis: the birth of Population Genetics.........................................................................118
The Modern Synthesis: Selection, Genetic Drift and Migration ...............................................................122
The Modern Synthesis: from genes to species .........................................................................................125
Kimura’s Neutral theory............................................................................................................................128
Predictions of the Neutral Theory ............................................................................................................130
Section 5: Genetic Structure .....................................................................................................................132
Making sense of the Variation of life ........................................................................................................133
Homology and Analogy .............................................................................................................................135
Phylogenetic reconstruction I ...................................................................................................................137
Phylogenetic reconstruction II ..................................................................................................................139
Other ways of measuring Genetic Structure ............................................................................................141
Speciation: the classical view ....................................................................................................................144
Speciation: the modern view ....................................................................................................................147
Coalescence ..............................................................................................................................................149
Molecular Ecology .....................................................................................................................................151
Phylogeography ........................................................................................................................................152
Conservation genetics ...............................................................................................................................155
Losing genetic diversity .............................................................................................................................156
Conservation Units....................................................................................................................................158
Taxonomic issues ......................................................................................................................................160
Conservation Genomics ............................................................................................................................162

Section 1: The Central Dogma
This first section of the course will focus on the basic molecules of inheritance – deoxyribonucleic acid
or DNA. You will learn of how it was discovered, how it is organised in the genome and how it replicates.
Francis Crick called the transfer of this DNA-encoded genetic information between the three basic
molecules of life, that is between DNA, RNA and proteins, the central dogma of molecular biology.
For additional reading see
Chapters 1, 5, 6, Hartl and Jones, Genetics – Principles and Analysis (4th Edition).
Section Outcomes.
By the end of this section you should be able to:
1. Explain what the central dogma is.

2. Describe the structure of DNA and RNA, their composition, and how they differ.
3. Describe the molecular structure of eukaryotic chromosomes
4. Explain the chromosomal theory of inheritance
5. Explain the theory of endosymbiosis for the existence of organelle DNA
6. Describe how DNA is replicated.
7. Contrast the differences between transcription in prokaryotes and eukaryotes
8. Explain why and how RNA processing occurs
9. Use the genetic code to translate a nucleic acid sequence into an amino acid sequence.
10. Describe the process of translation
11. Explain why DNA repair is necessary and the different mechanisms of DNA repair.

The Discovery of DNA
What is Genetics?
Genetics is the study of the biological process where a parent passes genes onto their offspring. In other
words, genetics is simply the study of inheritance or heredity. Every child inherits genes from both of
their biological parents and these genes in turn express specific traits.
DNA (or deoxyribonucleic acid) is the molecule that carries the genetic information in all cellular forms
of life and some viruses. It belongs to a class of molecules called the nucleic acids, which are
polynucleotides - that is, long chains of nucleotides.
Each nucleotide consists of three components:
 a nitrogenous base: cytosine (C), guanine (G), adenine (A) or thymine (T)
 a five-carbon sugar molecule (deoxyribose in the case of DNA)
 a phosphate molecule
The backbone of the polynucleotide is a chain of sugar and phosphate molecules. Each of the sugar
groups in this sugar-phosphate backbone is linked to one of the four nitrogenous bases.
Strand of polynucleotides
DNA's ability to store - and transmit -

information lies in the fact that it
consists of two polynucleotide strands
that twist around each other to form a
double-stranded helix. The bases link
across the two strands in a specific
manner using hydrogen bonds:
cytosine (C) pairs with guanine (G),
and adenine (A) pairs with thymine
(T).

Double strand of polynucleotides
The double helix of the complete DNA molecule resembles a spiral staircase, with two sugar phosphate
backbones and the paired bases in the centre of the helix. This structure explains two of the most
important properties of the molecule. First, it can be copied or 'replicated', as each strand can act as a
template for the generation of the complementary strand. Second, it can store information in the linear
sequence of the nucleotides along each strand.
DNA helix showing nitrogenous bases

How was DNA first discovered and who discovered it?
It is a common misconception that James Watson and Francis Crick discovered DNA in the 1950s. In
reality, DNA was discovered decades before. It was by following the work of the pioneers before them
that James and Francis were able to come to their ground-breaking conclusion about the structure of
DNA in 1953.
The story of the discovery of DNA begins in the 1800s…
The molecule of life
The molecule now known as DNA was first identified in the 1860s by a Swiss chemist called Johann
Friedrich Miescher. Johann set out to research the key components of white blood cells, part of our
body’s immune system. The main source of these cells was pus-coated bandages collected from a
nearby medical clinic.
“Johann called this mysterious substance ‘nuclein’. Unbeknown to him, Johann had discovered the
molecular basis of all life – DNA.”
Johann carried out experiments using salt solutions to understand more about what makes up white
blood cells. He noticed that, when he added acid to a solution of the cells, a substance separated from
the solution. This substance then dissolved again when an alkali was added. When investigating this
substance he realised that it had unexpected properties different to those of the other proteins he was
familiar with. Johann called this mysterious substance ‘nuclein’, because he believed it had come from
the cell nucleus. Unbeknown to him, Johann had discovered the molecular basis of all life – DNA. He
then set about finding ways to extract it in its pure form.
Johann was convinced of the importance of nuclein and came very

close to uncovering its elusive role, despite the simple tools and
methods available to him. However, he lacked the skills to
communicate and promote what he had found to the wider scientific
community. Ever the perfectionist, he hesitated for long periods of
time between experiments before he published his results in 1874.
Before then he primarily discussed his findings in private letters to his
friends. As a result, it was many decades before Johann Friedrich
Miescher’s discovery was fully appreciated by the scientific
community.
For many years, scientists continued to believe that proteins were

the molecules that held all of our genetic material. They believed
that nuclein simply wasn’t complex enough to contain all of the
information needed to make up a genome. Surely, one type of molecule could not account for all the
variation seen within species

The four building blocks of DNA
Albrecht Kossel was a German biochemist who made great progress in

understanding the basic building blocks of nuclein.
In 1881 Albrecht identified nuclein as a nucleic acid and provided its

present chemical name, deoxyribonucleic acid (DNA). He also isolated the
five nucleotide bases that are the building blocks of DNA and RNA: adenine
(A), cytosine (C), guanine (G), thymine (T) and uracil (U).
This work was rewarded in 1910 when he received the Nobel Prize in
Physiology or Medicine.
The Central Dogma
The central dogma of molecular biology is an

explanation of the flow of genetic
information within a biological system. It was
first stated by Francis Crick in 1956. It “deals
with the detailed residue-by-residue transfer
of sequential information. It states that such
information cannot be transferred back from
protein to either protein or nucleic acid.”
Transfers of information were arranged into
the categories general transfers (in orange)
and special transfers that only occurred in
rare situations (green).
Crick's use of the word dogma was unconventional, and has been controversial.

Genome Organisation
The genome is all the DNA in the cell. But how is it organised? After all, given the number of genes and
the amount of genetic material in each cell, it would have to be well organised to even work properly.
Genes alleles and loci
gene = length of DNA coding for functional RNA product a. note definition includes both DNA coding for
proteins (mRNA is functional RNA product) and for rRNAs, tRNAs locus (pl. loci) = position of gene on
chromosome; often used almost synonymously with gene alleles = versions of a gene that differ in their
base sequences .
The gene is the basic physical and functional unit of heredity. It consists of a specific sequence of
nucleotides at a given position on a given chromosome that codes for a specific protein (or, in some
cases, an RNA molecule).
Genes consist of three types of nucleotide sequence:
 coding regions, called exons, which specify a sequence of amino acids
 non-coding regions, called introns, which do not specify amino acids
 regulatory sequences, which play a role in determining when and where the protein is
made (and how much is made)
The structural components of a gene
A human being has 20,000 to 25,000 genes located on 46 chromosomes (23 pairs).
Unique or single-copy genes code for mRNA which codes for polypeptides
Highly repetitive sequences are also known as satellite DNA, they constitute 5-45% of the genome.
Sequences are 5-300 base pairs per repeat, and my be repeated up to 10,000 times per genome. The
function of repetitive DNA is not known. Since repetitive sequences vary from person to person, they
are useful in DNA profiling, which allows for DNA fingerprinting to identify samples from individuals
The genes plus all the other DNA in the cell, are known collectively, as the genome.
Chromosomes
DNA base pairs are organised into genes, and genes along with other DNA are organised into
chromosomes. The chromosomes contain genes just like pages of a book. Some chromosomes may

carry thousands of important genes while some may carry only a few. The physical position of the gene
on the chromosome is called its locus. This is an important word, so remember it. The chromosomes are
very long thin strands of DNA, coiled up tightly.
Eukaryotic chromosomes
The label eukaryote is taken from the Greek for 'true nucleus', and eukaryotes (all organisms except
viruses, Eubacteria and Archaea) are defined by the possession of a nucleus and other membrane-
bound cell organelles.
The haploid human genome contains

approximately 3 billion base pairs of DNA
packaged into 23 chromosomes. Of course,
most cells in the body (except for female ova
and male sperm) are diploid, with 23 pairs of
chromosomes. That makes a total of 6 billion
base pairs of DNA per cell. Because each base
pair is around 0.34 nanometers long (a
nanometer is one-billionth of a meter), each
diploid cell therefore contains about 2 meters
of DNA [(0.34 × 10-9) × (6 × 109)]. Moreover, it
is estimated that the human body contains
about 50 trillion cells—which works out to 100
trillion meters of DNA per human. Now,
consider the fact that the Sun is 150 billion
meters from Earth. This means that each of us
has enough DNA to go from here to the Sun
and back more than 300 times, or around
Earth's equator 2.5 million times! How is this possible?
This is because the DNA is tightly packed into structures called chromosomes, which consist of long
chains of DNA and associated proteins known as chromatin. In eukaryotes, DNA molecules are tightly
wound around proteins - called histone proteins - which provide structural support and play a role in
controlling the activities of the genes. A strand 150 to 200 nucleotides long is wrapped twice around a
core of eight histone proteins to form a structure called a nucleosome. The histone octamer at the
centre of the nucleosome is formed from two units each of histones H2A, H2B, H3, and H4. The chains
of histones are coiled in turn to form a solenoid, which is stabilised by the histone H1. Further coiling of
the solenoids forms the structure of the chromosome proper.
Each chromosome has a p arm and a q arm. The p arm (from the French word 'petit', meaning small) is
the short arm, and the q arm (the next letter in the alphabet) is the long arm. In their replicated form,
each chromosome consists of two chromatids.

The chromosomes - and the DNA they
contain - are copied as part of the cell
cycle, and passed to daughter cells through
the processes of mitosis.
Human beings have 46 chromosomes,

consisting of 22 pairs of autosomes and a
pair of sex chromosomes: two X sex
chromosomes for females (XX) and an X
and Y sex chromosome for males (XY). One
member of each pair of chromosomes
comes from the mother (through the egg
cell); one member of each pair comes from
the father (through the sperm cell).
A photograph of the chromosomes in a cell is known as a karyotype. The autosomes are numbered 1-22
in decreasing size order.
Karyotype of a human male
Prokaryotic chromosomes
The prokaryotes (Greek for 'before nucleus' - including Eubacteria and Archaea) lack a discrete nucleus,
and the chromosomes of prokaryotic cells are not enclosed by a separate membrane.
Most bacteria contain a single, circular chromosome. (There are exceptions: some bacteria - for
example, the genus Streptomyces - possess linear chromosomes, and Vibrio cholerae, the causative
agent of cholera, has two circular chromosomes.) The chromosome - together with ribosomes and

proteins associated with gene expression - is located in a region of the cell cytoplasm known as the
nucleoid.
The genomes of prokaryotes are compact compared with those of eukaryotes, as they lack introns, and
the genes tend to be expressed in groups known as operons. The circular chromosome of the bacterium
Escherichia coli consists of a DNA molecule approximately 4.6 million nucleotides long.
In addition to the main chromosome, bacteria are also characterised by the presence of extra-
chromosomal genetic elements called plasmids. These relatively small circular DNA molecules usually
contain genes that are not essential to growth or reproduction.
The chromosome theory of inheritance
In the early 1900s, the work of Gregor Mendel was rediscovered and his ideas about inheritance began
to be properly appreciated. As a result, a flood of research began to try and prove or disprove his
theories of how physical characteristics are inherited from one generation to the next.
In the middle of the nineteenth century, Walther Flemming, an anatomist from Germany, discovered a
fibrous structure within the nucleus of cells. He named this structure ‘chromatin’, but what he had
actually discovered is what we now know as chromosomes. By observing this chromatin, Walther
correctly worked out how chromosomes separate during cell division, also known as mitosis.
“Walter Sutton and Theodor Boveri first presented the idea that the genetic material passed down from
parent to child is within the chromosomes.
The chromosome theory of inheritance was developed primarily by Walter Sutton and Theodor Boveri.
They first presented the idea that the genetic material passed down from parent to child is within the
chromosomes. Their work helped explain the inheritance patterns that Gregor Mendel had observed
over a century before.
Walter Sutton (left) and Theodor Boveri (right)

worked independently to come up with the
chromosome theory of inheritance.
Image credit: Wikimedia Commons.
Building on Walther Flemming’s findings with

chromatin, German embryologist Theodor Boveri
provided the first evidence that the chromosomes
within egg and sperm cells are linked to inherited
characteristics. From his studies of the roundworm
embryo he also worked out that the number of
chromosomes is lower in egg and sperm cells
compared to other body cells.
American graduate, Walter Sutton, expanded on Theodor’s observation through his work with the
grasshopper. He found it was possible to distinguish individual chromosomes undergoing meiosis in the
testes of the grasshopper and, through this, he correctly identified the sex chromosome. In the closing

statement of his 1902 paper he summed up the chromosomal theory of inheritance based around these
principles:
1. Chromosomes contain the genetic material.

2. Chromosomes are passed along from parent to offspring.
3. Chromosomes are found in pairs in the nucleus of most cells (during meiosis these pairs
separate to form daughter cells).
4. During the formation of sperm and eggs cells in men and women, respectively, chromosomes
separate.
5. Each parent contributes one set of chromosomes to its offspring.
Extranuclear DNA
also called Organelle DNA
The genome also consists of DNA that is not

included in the chromosomes. These are
typically from cellular organelles such as
mitochondria and chloroplast (in plants).
Why would some organelles have their
own DNA? In the late 1960s Lynn
Margulis suggested that mitochondria
looked remarkably like bacteria and may
have evolved by endosymbiosis,
whereby an aerobic bacterium began living inside an early possibly anaerobic eukaryotic cell, providing
the cell with energy through the metabolism of oxygen. Chloroplasts similarly may have evolved from
the endosymbiosis of cyanobacteria and plants.
Extranuclear DNA is much smaller than nuclear DNA, is

circular and usually inherited uniparentally, from the
parent that produces the cell that is fertilised. In the case
of mitochondria, inheritance if via the female parent.
This is known as maternal inheritance.
The mitochondrial genome does not recombine. It codes

for several genes involved in electron transport.
However, as many if not all these genes have also been
copied onto the nuclear genome.

DNA replication
In order for Prokaryotic and Eukaryotic cells to divide and proliferate, they must make copies of
themselves. They need to be duplicated. The genetic information in the cell must also be duplicated.
This process is called DNA replication. It is a complex process involving many enzymes.
General Principles for DNA replication
The DNA being replicated must be in a ready state for the start of replication, and there also has to be a
clear start point (the origin of replication) from which replication proceeds. As each piece of DNA must
only be copied once, there also has to be an end point to replication.
DNA replication must be carried out accurately, with an efficient proof reading and repair mechanism in
case of any mismatches or errors. And finally, the system of replication must also be able to distinguish
between the original DNA template and then newly copied DNA.
In order to be able to put these principles into context, it is helpful to look at the eukaryotic cell cycle to
see where the main checkpoints are in the process.
The Cell Cycle
Actively dividing eukaryote cells pass through a series of stages known collectively as the cell cycle: two
gap phases (G1 and G2); an S (for synthesis) phase, in which the genetic material is duplicated; and an M
phase, in which mitosis partitions the genetic material and the cell divides.

G1 phase. Metabolic changes prepare the cell for division. At a certain point - the restriction point - the
cell is committed to division and moves into the S phase.
S phase. DNA synthesis replicates the genetic material. Each chromosome now consists of two sister
chromatids.
G2 phase. Metabolic changes assemble the cytoplasmic materials necessary for mitosis and cytokinesis.
M phase. A nuclear division (mitosis) followed by a cell division (cytokinesis).
The period between mitotic divisions - that is, G1, S and G2 - is known as interphase.
The main check points in DNA replication occur: between G1phase and S phase: at the start of mitosis
(M phase) and finally between M phase and G1 phase, when the decision is made whether to go
quiescent or not.
Replication in terms of the cell:
 Must be ready: G1
 Must all start at the same time: G1 – S
 Must know where to start: Origin of Replication
 Must all finish: Complete S
 Must ensure that each piece of DNA is only replicated once, so need to know where to
end: Replicon
 Proof reading and repair: G2
 Able to distinguish between original and copy: Epigenetics
DNA Replication
During DNA replication, each DNA strand is used as a template to synthesize the second DNA strand.
DNA strand is ALWAYS synthesized in the 5' to 3' direction.

Chemistry of DNA replication
As a result, the whole DNA molecule is duplicated.
DNA replication is semi-conservative: in the "next generation" molecule one strand is "old" and another
is "new"

DNA replication starts within a special region of DNA called REPLICATION ORIGIN which is defined by a
specific nucleotide sequence
Replication of DNA is bidirectional. Two Y-shaped replication forks are moving in opposite directions
during DNA replication. It is not trivial to replicate both DNA strands in the 5' to 3' direction! Why?

DNA synthesis on one of the template DNA strands is discontinuous: new DNA is synthesized in small
pieces which then are connected together
DNA synthesis always starts with a RNA primer

Many enzymes are involved in DNA replication
DNA Polymerase: Matches the correct nucleotide and then joins adjacent nucleotides together
Primase: Provides and RNA primer to start polymerisation
Ligase: Joins adjacent DNA strands together
Helicase: Unwinds the DNA and melts it
Single Strand Binding Proteins: Keep the DNA single stranded after it has been melted by
helicase
Gyrase: A topisomerase that relieves torsional strain in the DNA molecule
Telomerase: Finishes off the ends of the DNA strand
Replication in Eukaryotes vs Prokaryotes
There is much conservation between the two systems, in as much as the enzymology, the replication
fork geometry, the basic fundamental features and the use of multi-protein machinery are all very much
the same in both. However, there are more protein components in the Eukaryotic replication
machinery. In prokaryotes, the replication form moves 10x faster than in eukaryotes.
Prokaryotic replication Eukaryotic replication

semiconservative replication semiconservative replication
single origin replication (oriC) multiple origins of replication (ARS -
Autonomously Replicating Sequence)
primer synthesized by primase primer synthesized by subunits of DNA
polymerase α
processing enzyme: DNA polymerase III processing enzymes: DNA polymerases α
and δ
removal of primer: DNA polymerase I removal of primer: DNA polymerase β
DNA free in cytoplasm as nucleoid chromatin structure, chromosomes,
histones
circular DNA linear DNA: problem of replication of
chromosome ends → telomerase

Transcription
Transcription literally means the act or process of making a copy of something. Legal secretaries, for
example, transcribe the taped conversations between lawyers and clients by typing them into a word-
processing program.
In genetics, “transcription” refers to the copying of a DNA sequence into an RNA sequence
 The structure of DNA is not altered as a result of this process, and it continues to store
information.
At the molecular level, a gene is a transcriptional unit
 Genes are defined as DNA sequences that are transcribed into RNA
RNA (ribonucleic acid)
Why is DNA double stranded and RNA single stranded?
The 'information' part of DNA is the nitrogenous base, as opposed to the pentose sugar or the
phosphate residues. In a single-stranded molecule, this important part would be exposed to the cellular
environment, providing more opportunity for it to be mutated by the various chemicals there. In a
double-stranded configuration, however, the two nitrogenous bases are locked within the complex,
facing each other in the centre of the molecule. This organisation helps to safeguard the blueprint DNA
from local mutagens. RNA is a copy of the DNA blueprint and only exists temporarily in the cell, and
therefore does not require this protection, hence it is single stranded.

The figure below shows a common organization of sequences within a bacterial polypeptide-coding
gene and its mRNA.
 Each gene has a promoter and one or more regulatory sequences.

o The promoter “promotes” transcription
o The regulatory sequences control when and where (in what cell type) the gene
will be expresssed
 The promoter and the regulatory sequences are DNA sequences that are not part of the
transcript. They are recognized and bound by DNA-binding proteins.
• Start codon: specifies the first amino acid in a
protein sequence, usually a formylmethionine
(in bacteria) or a methionine (in eukaryotes)
Signals the end of

protein synthesis
The Template and Coding Strands
RNA is fundamentally single-stranded and therefore only one strand of the DNA is actually copied into
RNA during transcription.
The strand that is actually being copied is termed the template strand or antisense strand.
 The RNA transcript will have the opposite polarity and the complementary sequence to
this strand
The opposite strand is called the coding strand or sense strand.
 The base sequence of this strand is identical in polarity and sequence to the RNA
transcript
o Except for the substitution of uracil in RNA for thymine in DNA
 Because the coding strand has the same sequence and polarity as the RNA, it is said to
“carry the gene.”

The gene is located on the
coding strand
Template strand
TEMPLATE
Coding strand
The Many Roles of RNA Transcripts
Once they are made, RNA transcripts play many different functional roles in the cell:

About 90% of the genes in most organisms code for mRNAs that are ultimately translated into
polypeptides. However, tRNAs, rRNAs, and other RNAs that do not code for polypeptides BUT have very
important roles in cellular processes.
Gene Transcription in Bacteria
Since our molecular understanding of gene transcription came from studies involving bacteria (mostly E.
coli) and the viruses that infect them (bacteriophages), we will start there.
Promoters are DNA sequences that “promote” gene expression
 More precisely, they direct the exact location for the initiation of transcription
Promoters are typically located just “upstream” (5’) of the site where transcription of a gene actually
begins
 The bases in a promoter sequence are numbered in relation to the transcription start
site, which is labeled “+1”
The promoter attracts RNA polymerase, the enzyme responsible for transcribing RNA, to the gene.
Without a promoter, a gene sequence would not be transcribed.
Stages of Transcription
A. Initiation
 In E. coli, the RNA polymerase holoenzyme is composed of
o Core enzyme
 Four subunits = a2bb’
o Sigma factor
 One subunit = s
 These subunits play distinct functional roles
 At the start of initiation, the RNA polymerase holoenzyme binds loosely to the DNA
 It then scans along the DNA, until it encounters a promoter
 When it does, the sigma factor recognizes both the –35 and –10 regions
 A region within the sigma factor that contains a helix-turn-helix structure then interacts
strongly with the promoter, causing RNA polymerase to “tighten its grip” on the DNA.
 The tight binding of the RNA polymerase to the promoter forms what is called the
closed complex
 Then, the open complex is formed when RNA polymerase denatures the double-
stranded DNA in the AT-rich Pribnow Box
 Next, the RNA polymerase makes a short RNA strand copy of the template strand within
the denatured region
o The sigma factor is released at this point
o This marks the end of initiation
o Note that RNA polymerase, unlike DNA polymerase, is a “smart enzyme”! It can
start an RNA strand all on its own.
 The core enzyme now slides down the DNA to synthesize the transcript

B. Elongation
The RNA transcript is synthesized during the elongation step
 The open complex formed by the action of RNA polymerase is about 17 bases long and
remains that size as the polymerase moves along the DNA
o Behind the open complex, the DNA rewinds back into the double helix
 On average, the rate of RNA synthesis is about 43 nucleotides per second
Similar to the
synthesis of DNA
via DNA polymerase

C. Termination
Termination is the end of RNA synthesis
 It occurs when the short RNA-DNA hybrid of the open complex is forced to separate
o This releases the newly made RNA as well as the RNA polymerase
Gene Transcription in Eukaryotes
Many of the basic features of gene transcription are very similar in bacteria and eukaryotes, but is more
complex in eukaryotes. Why?
Specifically, in eukaryotes, transcription is achieved by three different types of RNA polymerase (RNA
pol I-III). These polymerases differ in the number and type of subunits they contain, as well as the class
of RNAs they transcribe; that is, RNA pol I transcribes ribosomal RNAs (rRNAs), RNA pol II transcribes
RNAs that will become messenger RNAs (mRNAs) and also small regulatory RNAs, and RNA pol III
transcribes small RNAs such as transfer RNAs (tRNAs).
Because RNA pol II transcribes protein-encoding genes, it has been of particular importance to scientists
who study the regulation of eukaryotic gene expression, and its function is well understood. For
example, researchers know that RNA pol II can bind to a DNA sequence within the promoter of many
genes, known as the TATA box, to initiate transcription. Together with other common motifs (short
recognition sequences in the DNA), these elements constitute the core promoter. However, changes in
RNA pol II affinity and, therefore, gene expression can be influenced by surrounding DNA sequences
(enhancers), which in turn recruit transcription factors. Also important is a large protein complex called
the mediator, which mediates interactions between RNA pol II and various regulatory transcription
factors. While these properties of transcription regulation are very important, they remain an area of
active research.

RNA Processing
Why is there a need for RNA processing?
Splicing
Analysis of bacterial genes in the 1960s and 1970 revealed the following:
 The sequence of DNA in the coding strand corresponds to the sequence of nucleotides
in the mRNA
 This in turn corresponds to the sequence of amino acid in the polypeptide
 This is termed the colinearity of gene expression
However, analysis of eukaryotic genes in the late

1970s revealed that they are not always colinear
with their functional mRNAs
 Instead, coding sequences, called exons, are
interrupted by intervening sequences or
introns
 Transcription produces the entire gene
product
o Introns are later removed or excised
o Exons are later spliced together
This phenomenon is termed RNA splicing

 It is a common genetic phenomenon in
eukaryotes
 Occurs occasionally in bacteria as well
The initial transcription produces a long transcript

known as a pre-mRNA
 Splicing requires the aid of a multi-

component structure known as the
spliceosome

So why splice?
One benefit of genes with introns is a phenomenon
called alternative splicing
 A pre-mRNA with multiple introns can be
spliced in different ways
o This will generate mature mRNAs
with different combinations of exons
 This variation in splicing can occur in
different cell types or during different stages
of development

5’ Capping
Most mature mRNAs have a 7-methyl guanosine covalently attached at their 5’ end
 This event is known as capping
Capping occurs as the pre-mRNA is being synthesized by RNA pol II

 Usually when the transcript is only 20 to 25 bases long
The “cap” consists of a backwards methylated guanine with a triphosphate link to the 5’ nucleotide in
the mRNA. It is added on and is not part of the original transcript. It is bound by cap binding proteins,
which, in turn, are recognized and bound by the ribosome during translation initiation. Thus, the cap
“marks” the RNA as an mRNA and aids in its recognition by the ribosome for translation.
3’ Polyadenylation
Most mature mRNAs have a string of adenine nucleotides at their 3’ ends
 This is termed the polyA tail
The polyA tail, like the 5’ cap, is not encoded in the gene sequence
 It is added enzymatically after the gene is completely transcribed
Example of pre-mRNA Processing
The beta-globin gene has 3 exons and 2 introns.

 The pre-mRNA (also called hnRNA) contains the entire gene sequence starting at the +1
site and ending past the polyadenylation signal
 During processing, the introns are spliced out, the 5’ end of the mRNA is capped, and
the polyadenylation signal is cut and tailed.

The relationship between one gene and one polypeptide:
 when the relationship between genes and proteins was first discovered it was initially
thought that the relationship was one-to-one: one gene coding for one polypeptide
o a gene
 information coded in DNA nucleotide sequences
 transcribed into mRNA
 mRNA translated into a sequence of amino acids joined by peptide
bonds to produce a polypeptide
o polypeptide
 polymer of amino acids joined by peptide bonds
 each polypeptide’s function dependent on its precise sequence of
amino acids
 exceptions:
o some genes produce more than one polypeptide
 there are only about 21,000 human genes, but over 120,000 human
proteins
 therefore, many genes produce more than one protein
 this is possible because of post-transcriptional modification, combining
exons in various combinations
 example: lymphocyte production of antibodies:
 millions of different antibody proteins are produced from just a
few genes
 different lymphocytes splice together parts of these genes in
different ways
o some genes do not code for protein
 some genes code for tRNA
 not translated into protein
 transports amino acids to ribosomes
 some genes code for rRNA
 not translated into protein
 a component of ribosome structure and function
 some DNA sequences act as regulators of gene expression
 regulatory DNA is transcribed into regulatory RNA
 which then binds to other DNA sequences
 determining whether those genes are transcribed or not

Translation (Protein synthesis)
The mRNA contains sequences that are recognized by the translation machinery (the ribosome). The
start and stop codons are not important during transcription but are crucial signals during translation.
Translation is the final mechanism in Francis Crick’s central dogma of molecular biology, turning nucleic
acid bases (genotype) into protein structures (phenotype).
The Genetic Code
It is the order of the bases along a single strand that constitutes the genetic code. The four-letter
'alphabet' of A, T, G and C forms 'words' of three letters called codons, similar to the words on a page.
The words when strung together act as the blueprints that tells the cells of the body when and how to
grow, mature and perform various functions. Individual codons code for specific amino acids. A gene is a
sequence of nucleotides along a DNA strand - with 'start' and 'stop' codons and other regulatory
elements - that specifies a sequence of amino acids that are linked together to form a protein.
So, for example, the codon AGC codes for the amino acid serine, and the codon ACC codes for the amino
acid threonine.
There are a two points to note about the genetic code:
It is universal. All life on Earth uses the same code (with a few minor exceptions).
It is degenerate. Each amino acid can be coded for by more than one codon. For example, AGC
and ACC both code for the amino acid serine. This is also known as codon redundancy.
A codon table sets out how the triplet codons code for specific amino acids.

Translation
Translation takes place in the ribosomes on the endoplasmic reticulum or cytoplasm.
It can be divided into 3 stages: initiation, elongation and termination.
Three 3 bases on the mRNA code for 1 amino acid on the protein. Transfer RNAs (tRNAs) contain
anticodons that are bound to amino acids, according to the genetic code.
 polysomes: several to many ribosomes translating the same mRNA into protein; each
moving in the 5’ to 3’ direction
 start codon: the mRNA triplet codon AUG is universally the start codon used to mark the
beginning of the coding sequence of a gene; thus, the tRNA with the anticodon UAC and
carrying the amino acid methionine is always the first tRNA to enter the P-site during translation
 stop codon: there are three stop codons in the genetic code; none of these have a
corresponding tRNA; instead, when a ribosome encounters a stop codon, a release factor binds
to the stop codon, which terminates translation and allows the separation of all of its
components

Initiation of Translation: The small ribosomal subunit searches for the AUG (Methionine) start site on
the mRNA.
Elongation of Translation:
 tRNA with anticodon complementary to second mRNA codon binds to ribosomal A site,
with appropriate amino acid attached to tRNA
 enzymes in ribosome catalyze formation of peptide bond between 1st, P site, and 2nd, A
site, amino acids
 P site tRNA, now separated from amino acid, exits ribosome
 ribosome moves one codon (3 nucleotides) along the mRNA, thus shifting previous A-
site tRNA to P-site, and opening A-site
 tRNA with anticodon complementary to A-site mRNA codon binds to ribosomal A-site,
with appropriate amino acid attached to tRNA terminal
 enzymes in ribosome catalyze formation of peptide bond between 2nd and 3rd amino
acids
 P site tRNA, now separated from its amino acid, exits ribosome
 ribosome moves one codon (3 nucleotides) along the mRNA, thus shifting previous A-
site tRNA to P-site, and opening A-site
 repetition of process until stop codon (see genetic code) is reached.

Termination of Translation:
 when ribosomal A-site reaches a stop codon, no tRNA has a complementary anticodon
 release factor protein binds to ribosomal A-site stop codon
 polypeptide and mRNA are released
 large and small ribosomal subunits separate

DNA repair
Maintenance of DNA Sequences
The integrity of DNA sequences can be compromised during DNA replication when the genetic material
is being duplicated or by agents that cause damage to DNA.
DNA repair during or after DNA replication
DNA Polymerase as a Self-Correcting Enzyme
The correct nucleotide has a greater affinity for moving polymerase than the incorrect nucleotide has.
Exonucleolytic proofreading of DNA polymerase occurs as follows:
 DNA molecules with mismatched 3’ OH ends are not effective templates because
polymerase cannot extend when 3’ OH is not base paired.
 DNA polymerase has a separate catalytic site that removes unpaired residues at the
terminus
th
Source: Molecular biology of the cell, 4 Edition
The diagram shows the 2 catalytic sites: P where polymerisation takes place, and E, where editing takes
place
Strand Directed Mismatch Repair System
DNA mismatch repair is a system which recognises and repairs erroneous insertion, deletion and mis-
incorporation of bases that can arise during DNA replication. It also repairs some forms of DNA damage.

Mismatch repair is strand-specific. During DNA synthesis the newly synthesised (daughter) strand will
often include errors. In order to carry out the repairs, the mismatch repair machinery distinguishes the
newly synthesised strand from the template (parental).
The mismatch repair system carries out the following functions:
 Removes replication errors which are not recognised by the replication machine
 Detects distortions in the DNA helix
 Distinguishes the newly replicated strand from the parental strand by means of
methylation of A residues in GATC in bacteria
 Methylation occurs shortly after replication occurs
 Reduces error rate 100x
 3 step process: recognition of mismatch; excision of segment of DNA containing

mismatch; resynthesis of excised fragment
In mammals the newly synthesised strand is preferentially nicked and can be distinguished in this
manner from the parental strand.
If there is a defective copy of the mismatch repair gene, then a predisposition to cancer is the end
result.
th
Source: Molecular biology of the cell, 4 Edition

Causes of DNA damage
DNA can be damaged by:
 Chemical mutagens
 Radiation
 Free radicals
UV-B light causes crosslinking between adjacent cytosine and thymine bases creating pyrimidine
dimers. This is called direct DNA damage.
UV-A light creates mostly free radicals. The damage caused by free radicals is called indirect DNA
damage.
Ionizing radiation such as that created by radioactive decay or in cosmic rays causes breaks in DNA
strands.
Thermal disruption at elevated temperature increases the rate of depurination (loss of purine bases
from the DNA backbone) and single strand breaks. For example, hydrolytic depurination is seen in the
thermophilic bacteria, which grow in hot springs at 85–250 °C.[6] The rate of depurination (300 purine
residues per genome per generation) is too high in these species to be repaired by normal repair
machinery, hence a possibility of an adaptive response cannot be ruled out.
Industrial chemicals such as vinyl chloride and hydrogen peroxide, and environmental chemicals such as
polycyclic hydrocarbons found in smoke, soot and tar create a huge diversity of DNA adducts-
ethenobases, oxidized bases, alkylated phosphotriesters and Crosslinking of DNA just to name a few.
The natural ageing process and respiration also causes DNA damage at the rate of around 10000
lesions/cell/day.
The main types of DNA damage that occurs are: base loss and base modification.

* The thickness of the arrows corresponds to the relative sensitivity to alkylation.
Source: openlearn.ac.uk (creative commons)
DNA repair after DNA damage
Despite the 1000’s of alterations that occur in our DNA each day, very few are actually retained as
mutations and this is due to highly efficient DNA repair mechanisms. This is a very important
mechanism, and this is highlighted by the high number of genes that are devoted to DNA repair. Also, if
there is a inactivation or loss of function of the DNA repair genes, then this results in increased
mutation rates.

Defects in the DNA repair mechanisms are associated with several disease states as can be seen in the
following table:
Disorder Frequency Defect Hereditary/non

Hereditary
Fanconi’s anaemia 1/22,000 in some Deficient excision Non – hereditary
populations repair
Hereditary nonpolyposis 1/200 Deficient mismatch Hereditary
colon cancer repair
Werner’s syndrome 3/1,000000 Deficient helicase Non-hereditary
Xeroderma pigmentosum 1/250,000 Deficient excision Hereditary
repair
DNA damage can activate the expression of whole sets of genes, including:
 the Heath Shock Response
 the SOS response
The SOS response is a post-replication DNA repair system that allows DNA replication to bypass lesions
or errors in the DNA. The SOS uses the RecA protein. The RecA protein, stimulated by single-stranded
DNA, is involved in the inactivation of the LexA repressor thereby inducing the response. It is an error-
prone repair system.
Base Excision Repair (BER)
BER is a cellular mechanism that repairs damaged DNA throughout the cell cycle. It is primarily
responsible for removing small, non-helix distorting base lesions from the genome. The related
nucleotide excision repair pathway repairs bulky helix-distorting lesions. BER is important for removing
damaged bases that could otherwise cause mutations by mispairing or lead to breaks in DNA during
replication. BER is initiated by DNA glycosylases, which recognize and remove specific damaged or
inappropriate bases, forming AP sites. These are then cleaved by an AP endonuclease. The resulting
single-strand break can then be processed by either short-patch (where a single nucleotide is replaced)
or long-patch BER (where 2-10 new nucleotides are synthesized).

A. DNA glycosylace recognises
damaged base
B. Removes base leaving

deoxyribose sugar
C. AP endonuclease cuts
phosphodiester backbone
D. DNA polymerase replaces

missing nucleotide
E. DNA ligase seals nick
Source: Friedberg, E.C., Walker, G.C. and Siede, W. (1995). DNA Repair and Mutagenesis. American Society for Microbiology,
Washington DC, USA, pp. 91-225.

If there are double strand breaks in DNA, then there are 2 methods by which DNA can be repaired:
 non-homologous end-joining repair:- the original DNA sequence is altered during repair
(by means of deletions or insertions)
 homologous end-joining repair: - this is a general recombination mechanism where

information is transferred from the intact strand.
Source: Molecular biology of the Cell.
Failure of DNA repair
When DNA repair fails, fewer mutations are corrected and this leads to an increase in the number of
mutations in the genome.
In most cases, the protein p53 monitors the repair of damaged DNA, however, if the damage is too
severe, then p53 promotes programmed cell death (apoptosis).
However, mutations in genes which encode the DNA repair proteins can be inherited and this leads to
an overall increase in the number of mutations as errors or damage to the DNA is no longer repaired
efficiently.

Section 2: Recombinant DNA Technology
As soon as we were able to understand how DNA replicates itself and how it is transcribed, we were
able to use the enzymes involved in these processes to create our own combinations of DNA fragments.
We call this newfound knowledge recombinant DNA technology. In this section you will learn how we
can make and identify recombinant DNA. You will learn about the different ways of visualising
recombinant DNA and the two great revolutions in recombinant DNA technology that have changed the
rate at we are able to generate DNA data, namely the polymerase chain reaction and next generation
sequencing.
For additional reading see:
Chapter 9, Hartl and Jones, Genetics – Principles and Analysis (4th Edition).
Section Outcomes.
1. Describe the methods used to make recombinant DNA

2. Explain what vectors are and why they are useful.
3. Explain what a DNA library and its uses
4. Explain the concept of a cDNA library
5. Compare cloning and PCR as methods for amplifying DNA
6. Evaluate the difference between gel electrophoresis and southern blotting
7. Compare first generation and next generation sequencing
8. Evaluate the potential applications for next generation sequencing

Recombinant DNA
Recombinant DNA refers to the creation of new combinations of DNA segments that are not found
together in nature. The isolation and manipulation of genes allows for more precise genetic analysis as
well as practical applications in medicine, agriculture, industry and conservation.
Fundamental changes in our society are occurring as a result of genetic engineering.
Making recombinant DNA
Overview: Isolate DNA - Cut with restriction enzymes - Ligate into cloning vector - transform
recombinant DNA molecule into host cell - each transformed cell will divide many, many times to form
a colony of millions of cells, each of which carries the recombinant DNA molecule (DNA clone)
(From: AN INTRODUCTION TO GENETIC ANALYSIS 6/E BY Griffiths, Miller, Suzuki,Leontin, Gelbart 1996 by W.
H. Freeman and Company).
A. Isolating DNA
1. Isolation of DNA is accomplished by isolating cells - disrupting lipid membranes with
detergents - destroying proteins with phenol or proteases (eg. Proteinase K)- degrading RNAs with
RNase – precipitating with alcohol to bring the remaining DNA out of solution and then redissolving the
DNA in water.

B. Cutting DNA
1. DNA can be cut into large fragments by mechanical shearing.
2. Restriction enzymes are the scissors of molecular genetics. Restriction enzymes (RE) are
endonucleases that will recognize specific nucleotide sequences in the DNA and break the DNA chain
at those points. A variety of RE have been isolated and are commercially available. Most cut at
specific palindromic sites in the DNA (sequence that is the same on both antiparallel DNA strands).
These cuts can be a staggered which generate “sticky or overhanging ends” or a blunt which generate
flush ends.
C. Joining DNA
Once you have isolated and cut the donor and vector DNAs, they must be joined together. The DNAs
are mixed together in a tube. If both have been cut with the same RE, the ends will match up because
they are sticky. DNA ligase is the glue of molecular genetics that holds the ends of the DNAs together.
DNA ligase creates a phosophodiester bond between two DNA ends.
(From: AN INTRODUCTION TO GENETIC ANALYSIS 6/E BY Griffiths, Miller, Suzuki, Leontin, Gelbart 1996 by W.

D. Amplifying the recombinant DNA
To recover large amounts of the recombinant DNA molecule, it must be amplified. This is
accomplished by transforming the recombinant DNA into a bacterial host strain. (The cells are treated
with CaCl2 - DNA is added - Cells are heat shocked at 42 C - DNA
goes into cell by a somewhat unknown mechanism.) Once in a cell, the recombinant DNA will be
replicated. When the cell divides, the replicated recombinant molecules go to both daughter cells
which themselves will divide later. Thus, the DNA is amplified.
(From: AN INTRODUCTION TO GENETIC ANALYSIS 6/E BY Griffiths, Miller, Suzuki, Leontin, Gelbart 1996 by
W. H. Freeman and Company.)
DNA clone = A section of DNA that has been inserted into a vector molecule and then replicated in
a host cell to form many copies.
E. Vectors
1. Requirements for a cloning vector
a) Should be capable of replicating in host cell
b) Should have convenient RE sites for inserting DNA of interest
c) Should have a selectable marker to indicate which host cells received
recombinant DNA molecule
d) Should be small and easy to isolate

2. Bacterial plasmids are small, circular DNA molecules that are separate from the rest of the
chromosome. They replicate independently of the bacterial chromosome. Useful for cloning DNA
inserts less that 20 kb (kilobase pairs). Inserts larger than 20 kb are lost easily in the bacterial cell.
(From: AN INTRODUCTION TO GENETIC ANALYSIS 6/E BY Griffiths, Miller, Suzuki, Leontin, Gelbart 1996
by W. H. Freeman and Company).

3. Bacteriophage lambda (45 kb) contains a central region of 15 kb that is not required for replication
or formation of progeny phage in E. coli. Thus, lambda can be used as a cloning vector by replacing
the central 15 kb with 10-15 kb of foreign DNA. This is done as follows: mix RE cut donor DNA and
lambda DNA in test tube - ligate - use in vitro packaging mix that will assemble progeny phage
carrying the foreign DNA - infect E. coli with the phage to amplify

4. Cosmids are hybrids of phages and plasmids that can carry DNA fragments up to 45 kb. They can
replicate like plasmids but can be packaged like phage lambda.
(From: AN INTRODUCTION TO GENETIC ANALYSIS 6/E BY Griffiths, Miller, Suzuki, Leontin, Gelbart 1996 by W.
5. Expression vectors are vectors that carry host signals that facilitate the transcription
and translation of an inserted gene. They are very useful for expressing eukaryotic genes in
bacteria.
6. Yeast artificial chromosomes (YACS) are yeast vectors that have been engineered to
contain a centromere, telomere, origin of replication, and a selectable marker. They can carry up
to 1,000 kb of DNA. Since they are maintained in yeast (a eukaryote), they are useful for cloning
eukaryotic genes that contain introns. Also, eukaryotic genes are more easily expressed in a
eukaryotic host such as yeast.
7. Bacterial artificial chromosomes (BACS) are bacterial plasmids derived from the F plasmid.
They are capable of carrying up to 300 kb of DNA.

Making a DNA Library
What is a DNA library?
DNA Libraries are literally a collection of DNA clones that are stored in a certain vector. The goal is
to have each gene represented in the library at least once. This is like a real library: A collection of
books stored in a building, and the goal is to have at least one copy of all books in the library.
DNA libraries are the basis of many important techniques in molecular genetics, including ALL next
generation sequencing applications (we will come to this in a few lecture time). So you will need to
understand what DNA libraries are and how we make them.
There are different types of libraries, depending on what you are interested in.
1. You can have a library named on the basis of the vector – eg. a plasmid library, or a phage
library or a YAC library. See last lecture for what these things are.
2. Or you can categorise the library by the source of the donor DNA.
a. Genomic - made from DNA fragments of total genomic DNA. Genomic is usually cut into
smaller pieces by a restriction enzyme and then the small pieces are inserted into a vector. These
random genomic fragments result in a random shotgun library containing multiple overlapping
clones that cover the complete genome sequence.
b. Chromosome – made from RE DNA fragments of one chromosome isolated via flow
cytometry or pulsed field gel electrophoresis.
c. cDNA (complementary DNA) – made from DNA

synthesized from mRNA. These libraries are extremely useful
because they contain ONLY the transcribed RNA of the cell. So
they do not contain non-transcribed genomic DNA and they
do not even contain the introns of the gene – just the
expressed exons. That is why cDNA libraries are also called
expression libraries. They contain all the mRNA that was
transcribed in the cell – in other words, the transcriptome. As
a result, they are much smaller than other kinds of libraries.
While the heritable information of a cell resides in the

genome, the portion of this information that functionally
determines the cell's phenotype is expressed in the cells
mRNA population.In order to obtain access to this
information, we must convert the unstable mRNA population
into a DNA copy, or cDNA.
This enzyme reverse transcriptase allows us to use an RNA

template to produce a double-stranded cDNA copy. Reverse
transcriptase was discovered by H. Temin and D. Baltimore
while studying retroviruses. Retroviruses contain an RNA
genome which is converted to a DNA copy and integrated into the host genome during its replicative
cycle. This is an interesting set of viruses including many tumor viruses and the AIDS virus HIV.

Identification of DNA
We are able to identify DNA of interest to our particular study in several ways. These include:
A. Probing for the gene

1. DNA probe
a) DNA probes are based on the fact that a denatured (heated or
chemically treated to become single stranded) DNA molecule will hybridize (bind) to
sequences that match or are similar to it.
b) Where does the probe DNA come from?
(1) cDNA from highly expressed mRNA from a tissue
(2) homologous gene from a related organism
(3) DNA obtained from “reverse genetics” (protein - DNA): If you
have the protein product of the gene in which you are interested….. sequence part of the protein -
synthesize a short (>20 nucleotides) DNA probe based the protein sequence using the genetic code -
use as your probe

2. Protein probe – If you have the protein product of the gene of interest, make an antibody against it
- use the antibody to protein of interest is used to screen the library for the clone that is expressing
the gene that codes for the protein
B. Complementation
Clones can be detected based on their ability to confer a
missing function on a mutant.

C. Positional cloning is any method of cloning that makes use of information about a gene’s
chromosomal location in order to clone it.
You know that your gene of interest (gene X) is linked to gene A, for which you have a probe: Using
a library of overlapping RE fragments - Isolate a clone (clone 1) containing A - RE analysis of clone
1- use end of clone 1 as a probe to isolate a new clone (clone 2) - RE analysis of clone 2 - use end of
clone 2 as a probe to isolate a new clone - etc - until you get to gene X
D. Tagging
Use a gene (tag) to which you have a probe to mark your gene of interest by inserting that gene into
your gene of interest
For example, you are interested in cloning genes that are important for iron transport…. Use
transposon (jumping gene) to hop randomly into the chromosome - Screen for those organisms that
are affected in iron transport - cross putative tagged iron transport mutants with tester to verify
that the mutant phenotype segregates with the tag - make library of the DNA from tagged mutant -
select or probe for the tag (and therefore your gene).

Visualising Recombinant DNA
A. Gel electrophoresis – DNA fragments of different sizes can be separated by an electrical

field applied to a “gel”. The negatively charged DNA migrates away from the negative electrode and
to the positive electrode. The smaller the fragment the faster it migrates.
B. Restriction enzyme mapping – Frequently it is important to have a restriction enzyme site
map of a cloned gene for further manipulations of the gene. This is accomplished by digestion of the
gene singly with several enzymes and then in combinations. The fragments are subjected to gel
electrophoresis to separate the fragments by size and the sites are deduced based on the sizes of the
fragments.
In this example, digestion with Enzyme 1 shows that there are two restriction sites for this enzyme,
but does not reveal whether the 3 kb segment is in the middle or on the end of the digested
sequence, which is 17 kb long. Combined digestion by both enzyme 1 and enzyme 2 leaves the 6 and
8 kb segments intact but cleaves the 3 kb segment, showing that enzyme 2 cuts within this enzyme 1
fragment. If the 3 kb section were on the outside of the fragment being studied, digestion by enzyme
2 alone would yield a 1 or 2 kb fragment. Since this is not the case, of the three restriction fragments
produced by enzyme 1, the 3 kb fragment must lie in the middle. That the RE2 site lies closer to the 6
kb section can be inferred from the 7 and 10 kb lengths of the enzyme 2 digestion.

C. Southern Blot
1. A Southern allows the detection of a gene of interest by probing DNA fragments that
have been separated by electrophoresis with a “labeled” probe.
2. Northern Blot (probe RNA on a gel with a DNA probe)

3. Western Blot (probe proteins on a gel with an antibody)
58 Course Guide – Molecular Genetics – ZOO3649 – 2015 – University of

Venda
D. DNA sequencing of a gene (First generation sequencing)
1. Maxam-Gilbert base destruction method – bases of a DNA molecule are selectively
destroyed – not used very much anymore because reagents are highly toxic and very dangerous
2. Sanger dideoxy method – Gene to be sequenced is used as a template for the synthesis
of new DNA strands, each randomly terminating due to the incorporation of a chain terminating
dideoxynucleotide in 4 different reaction tubes. This produces a population of molecules, each
terminating at a different site. Running the products in each tube on a gel allows the determination
of where each chain terminating dideoxynucleotide was incorporated. The DNA is visualized because
the DNA primer to start the reaction is radioactive or some of the dNTPs are radioactive.

(From: AN INTRODUCTION TO GENETIC ANALYSIS 6/E BY Griffiths, Miller, Suzuki, Leontin, Gelbart 1996 by
W. H. Freeman and Company).
This procedure is now automated so that a computer reads the sequence. Instead of using
radioactive primers, the primers are labeled with different color fluorescent dye for each reaction.

PCR
The polymerase chain reaction, otherwise known a PCR

allows the isolation of a specific segment of DNA from a small DNA
(or cell sample) using DNA primers at the ends of the
segment of interest.
The principle of the Polymerase Chain Reaction (PCR) was first

conceived in the mid-1980s by Kary Mullis. By 1993, the
impact of the technique had already been felt and Mullis was
awarded the Nobel Prize for Chemistry. Indeed, in its
modern day form PCR has become an indispensable and wide-
ranging tool for the molecular geneticist.
The principle of PCR:
PCR is a rapid in vitro method of creating copies of, or amplifying, a defined target from a source of
DNA. Just as replication in living cells is a semi-conservative process using each strand of the DNA
double helix as a template for a new strand, so too is PCR. However, we must have some prior
knowledge of the DNA sequence of the target so that we can selectively amplify it from the otherwise
heterogeneous collection of DNA molecules present in a typical sample such as total genomic DNA.
For any given target it is necessary to design two short oligonucleotides (each ~20 nucleotides long)
that flank the region of interest. These so-called primers should bind specifically to the
complementary DNA sequences and in the presence of a heat-stable DNA polymerase and the
building blocks of DNA (dNTPs), they initiate the synthesis of new DNA strands.
PCR consists of sequential cycles of synthesis and each cycle can be subdivided into three steps carried
out at different temperatures:
1. denaturation of the two strands of DNA (95°C)

2. primer annealing (usually 50-70°C, depending on primer pair)
3. primer extension by the polymerase & synthesis of new strands (72°C).
In practice, the
whole cycling
procedure is
performed by a
machine known as a
thermocycler that
can rapidly and
accurately switch
between the desired temperatures.

Visualising PCR Products
PCR products can typically be

viewed (or checked) by gel
electrophoresis. In this method, a
current is applied to either end of
the gel and DNA, because it is
negatively charged, will move
through the gel. The gel acts as a
sieve and so longer DNA fragments
have a slower mobility than smaller
fragments. It is thus possible to
separate fragment of different size,
estimate the fragment length of a
PCR product by comparison against
a size standard and assess the
quality of the PCR.
Quantitative PCR or Real time PCR
Reverse transcription (RT) followed by polymerase chain reaction (PCR) represents a powerful tool for
the detection and quantification of mRNA. Real-time RT-PCR is widely used because of its high
sensitivity, good reproducibility, and wide dynamic quantification range. It can determine differences
in gene expression
How does it work?
Extract RNA, then use reverse

transcriptase to convert the RNA
to DNA. Then look at the amount
of DNA amplified relative to a
known copy-number gene. In this
way you can quantify the amount
of RNA that has been transcribed
for a certain gene in the cell. It is
useful in determining differential
gene expression

Next generation sequencing
Sequencing revolution began in 2005 with the introduction of the sequencing-by-synthesis

technology developed by 454 Life Sciences and the multiplex polony sequencing protocol developed
by George Church’s lab. Pyrosequencing involves constructing a complementary DNA strand one
base at a time and detecting the actual base that was incorporated. The first 454 instrument was
about 300 times cheaper than previous Sanger technology. A single next-generation sequencing
instrument can generate as much data as several hundred Sanger-type sequencers, but can be
operated by a single person.
Since then several other NGS platforms have been developed, such as Illumina, Solid, etc. While each
NGS platform is unique, the overall steps and the underlying concepts are similar.
First DNA is extracted from the

sample and fragmented into a
library of small segments that are
amplified and subsequently
sequenced in millions of parallel
reactions. The sequencing step is
similar to previous methods – the
bases of each DNA fragment are
sequentially identified from light
signals emitted as the complement
to each fragment strand is re-
synthesized. The net result is a set
of newly identified strings of
nucleotides called “reads” that
represent all the DNA in the
sample. It generates an enormous
amount of genetic Data, more than
has ever been possible before. A
single run through an Illumina
HiSeq the machine produces an
enormous amount of data, 900-
1800 Gb.

Applications of NGS

Section 3: Variation
Variation is a central theme in biology. Without variation, there can be no basis for natural selection
to work, and therefore there can be no evolution, and none of us would be here.
A casual look at the world around you, at the plants, trees, insects, people, etc, shows you that the
world is made up of an incredible amount of variation. Where did this variation come from? How can
we detect it and measure it? These questions will be answered in this section.
Chapter 6, Bergstrom and Dugatkin Evolution (1st Edition)
Section Outcomes.
1. Compare and contrast the various mechanisms that generate genetic variation
2. Describe the different ways in which DNA can mutate.
3. Contrast the effects of these different mutations
4. Explain recombination and how it is advantageous
5. Describe what is a transposon
6. Explain gene conversion
7. Compare the relative advantages and disadvantages of different genetic markers
8. Compare phased with unphased genetic data
9. Describe how genetic variation can be organised
10. Explain how genetic variation can be measured at the individual and population level

Mutation I: Point mutations
A mutation is a change in the nucleotide sequence of the DNA in a cell. There are many different
kinds of mutations. Mutations can occur before, during, and after mitosis and meiosis. If a mutation
occurs in cells that will make gametes by meiosis or during meiosis itself, it can be passed on to
offspring and contribute to genetic variability of the population. Mutations are the sole source of
genetic variability that can occur in asexual reproduction. Mutations are usually harmful or neutral
to offspring but can occasionally be beneficial.
Two parts of the genome can be affected by mutation:

Germinal mutations occur in germ cells (egg and sperm) and can be passed on to future
generations. These are the mutations that are involved in evolution.
Somatic mutations occur in somatic cells and cannot be transmitted to offspring. They affect only
the individual concerned. eg Cancer.
Point mutations
Point mutations create new alleles. They can result from the insertion, deletion, or substitution of
one nucleotide in a gene sequence. There are two causes of point mutations, but both are the result
of reactions catalyzed by DNA polymerase:
a. Uncorrected replication errors
b. Errors in repair of damaged sites
Figure: Examples of the two ways in

which point mutations can occur
(A) An error in replication leads to a

mismatch in one of the daughter
double helices, in this case a T-to-
C change because one of the As in the
template DNA was miscopied. When
the mismatched molecule is itself
replicated it gives one double helix
with the correct sequence and one
with a mutated sequence.
(B) A mutagen has altered the

structure of an A in the lower strand of
the parent molecule, giving nucleotide
X, which does not base-pair with the T
in the other strand so, in effect, a
mismatch has been created. When the
parent molecule is replicated, X base-
pairs with C, giving a mutated
daughter molecule. When this
daughter molecule is replicated, both
granddaughters inherit the mutation.

Point mutations can be classified by the type of change to the DNA sequence:
1. Substitutions:
Transitions = replace purine with purine (A for G), pyrimidine with pyrimidine (T for C).
Transversions = replace purine with pyrimidine (T for G) and vice-versa
Substitutions leave the “reading frame” intact (beginning and ending point of each codon)
2. Insertions or Deletions (indels)
These cause frameshift mutations. All codons after the point of mutation are affected.
3. Slippage mutation
Slippage occurs during replication of the parent molecule, inserting an additional repeat unit
or failing to copy an existing unit into the newly synthesized polynucleotide of one of the
daughter molecules. When this daughter molecule replicates it gives a granddaughter
molecule whose sequence is one repeat unit longer/shorter than that of the original parent.
This is how microsatellite repeats are generated. This is the main reason why microsatellite
sequences are so variable, replication slippage occasionally generating a new length variant,
adding to the collection of alleles already present in the population.
Figure: Replication slippage at a microsatellite locus
Rates of point mutations

In general mutation rates are very low.
An average per-locus mutation rate of 10-5 (one in 100,000 gametes) is so low that the
rate of change in the frequency of an allele, due to mutation alone, is very low.
However, when the whole genome is taken into account (i.e., 20-25,000 genes in humans, 10,000 in
Drosophila), with an average mutation rate of 10-5 to 10-6, then 1 in 4 gametes
would carry a phenotypically detectable mutation somewhere in its genome.
Mutation rates vary

Mutation rates are generally greater in plants. Mutation rate is correlated with generation time. So
because of the way plants generate germ cell tissues – they come from somatic tissue, which will
differentiate into germ cell tissue. Therefore plants with long generation times accumulate more
mutations in future germ cell tissues than do plants with short generation times.

Animal germ line cells in undergo far few divisions, so fewer opportunities for mutation.
The mutation rate is also variable among individuals because of genetic variation in enzymes used
for “proofreading” and repair of DNA
Within an individual, the mutation rate can also vary among genes; this is also poorly understood.
a. mutation rate in coding regions of genes is less than non-coding regions.
b. Repair systems apparently work on transcriptionally active genes only.
c. The explains why slippage mutations that generate microsatellite loci have such high
mutation rates – they are non-coding and are not transcriptionally active.
Mutation rates also vary among types of cells, or organisms. Rates are lower in eukaryotes than
prokaryotes.

Mutation II: Rearrangement mutations
Mutations also result from gene rearrangements and other large changes in the DNA sequence of a
chromosome. Chromosomal rearrangements change chromosomal structure and can alter the
function of one or more genes and can change the pattern of gene transmission. A translocation is
movement of a segment of DNA from one place to another in a chromosome or between
chromosomes. An inversion is a mutation in which a segment of DNA has flipped within a
chromosome. A deletion is the loss of a segment of DNA. These large changes are relatively
common, at least over long periods of time, and are abundant in genomes that have been
sequenced.
Deletion
This is a rearrangement that removes a segment of DNA.
Deletions can be located within a chromosome (interstitial) or can remove the end
of a chromosome (terminal). Deletions can be small (intragenic), affecting only one gene, or can
span multiple genes (multigenic). Deletions can arise from DNA damage (X-rays or chemical agents
that break chromosomes). If the deleted region does not contain any genes essential for survival, an
individual homozygous for a deletion (Del/Del) will live. An example is the original white allele in
Drosophila which is a small deletion affecting only the white gene. However, large deletions that
span multiple genes usually result in homozygous lethality because they remove essential genes.
What about individuals heterozygous for a normal chromosome and a deficiency chromosome
(heterozygote Del/+)? In some instances, heterozygotes are viable and fertile. There are at least two
reasons why heterozygosity for a deletion might be detrimental.
(1) Gene dosage problems: a deletion heterozygote will have only half the normal dose of each gene
that is missing in the deletion. In general, humans cannot survive (even as heterozygotes) with
deletions that remove more than ~3% of the genome. An example of a syndrome caused by a
heterozygous deletion is “cri du chat” syndrome, which results from a deletion of all or part of the
short arm of Chromosome 5. The diagnostic phenotypic feature of the syndrome is that the cry of an
affected infant sounds like a high-pitched cat cry, as well as many other features. Although
individuals usually survive, they often don’t live past childhood.
(2) Somatic mutation of the remaining normal copy of an essential gene may lead to defects (often
called "pseudominance"). Individuals with retinoblastoma (malignant eye cancer) are often
heterozygous for deletions on Chromosome 13 in normal tissue; the disease results when a
somatic mutation in the remaining copy of the RB tumor suppressor gene occurs in retinal cells.
Duplication
This is a rearrangement that results in an increase in copy number of a particular chromosomal
region. In tandem duplications, the duplicated regions lie right next to one another, either in the
same order or in reverse order. In non-tandem duplications, the repeated regions lie far apart on the
same chromosome or on different chromosomes. Duplications can occur due to unequal crossing-
over, chromosome breaks and faulty repair, or replication errors. The dominant Bar mutation is a
tandem duplication of the 16A region of the Drosophila X chromosome. Unequal pairing and
crossing-over during meiosis in females homozygous for the Bar tandem duplication occasionally
leads to production of chromosomes bearing three copies of the region (which causes more extreme
double-bar eye phenotype) and chromosomes bearing one copy (conferring normal eyes). Another

example in humans is red-green colorblindness – in this example, unequal crossing over between the
closely related red and green photoreceptor genes can cause different combinations of the genes on
a chromosome or even hybrid receptors!
A duplication is less likely to affect phenotype than a deletion of comparable size, since the
duplicated genes are still present. However, duplications in heterozygotes can have phenotypic
consequences if gene copy number is important (three copies of each gene in the duplicated region
are now expressed!), or if genes in the duplicated segment are now put into a new chromosomal
environment that alters their expression level or pattern (position effect).
Inversion
A rearrangement in which a chromosomal segment is rotated 180 degrees. Inversions in which the
rotated segment includes the centromere are called pericentric inversions; those in which the
rotated segment is located completely on one chromosomal arm and do not include the centromere
are called paracentric inversions.
Inversions can occur when two double-strand breaks release a chromosomal region that inverts
before religating to flanking DNA, or by intrachromosomal recombination.
Even though gene order is changed in an inversion, many inversions do not cause abnormal
phenotypes. Many inversions can be made homozgous, and inversions can be detected in haploid
organisms. However, if the breakpoint of an inversion is within an essential gene, individuals
homozygous for the inversion will not survive. Unusual phenotypes can also be observed if the
inversion places a gene or group of genes in a new regulatory environment.
In inversion heterozygotes, the observed number of recombinant progeny is reduced. Why?
Because, during meiosis, the homologous chromosomes in inversion heterozygotes form an
inversion loop to maximize pairing. Recombination within the inversion loop leads to abnormal
chromatids (whether the inversion is pericentric or paracentric). Thus, even though crossovers
occur, the abnormal recombinant gametes can rarely give rise to viable progeny upon
fertilization. We see a preponderance of nonrecombinant progeny.

Translocation
A chromosomal rearrangement in which part of one chromosome becomes attached to a non-
homologous chromosome (non-reciprocal), or in which parts of two nonhomologous chromosomes
trade places (reciprocal). Most individuals bearing reciprocal translocations are viable and fertile.
However, just like with inversions, abnormal phenotypes can be observed if the translocation
breakpoint lies within a critical gene, or if the translocation places a gene or group of genes in a new
regulatory environment.
During meiosis in reciprocal translocation heterozygotes, the translocated chromosomes and the
normal homologous chromosomes maximize pairing by forming a cross-like structure among all
four chromosomes (instead of the usual two). A special kind of translocation, called a Robertsonian
translocation is one in which a reciprocal exchange between two acrocentric chromosomes leads to
a large metacentric chromosome and a very small chromosome (that may even carry so few genes
that it does not cause genetic imbalance and is lost). In fact, a less common form of Down's
Syndrome (<5% of cases) is caused by a Robertsonian translocation between Chromosome 21 and
Chromosome 14. This form of Down's Syndrome can recur in families.

The effects of mutations
When considering the effects of mutations we must make a distinction between the direct effect
that a mutation has on the functioning of a genome and its indirect effect on the phenotype of the
organism in which it occurs. The direct effect is relatively easy to assess because we can use our
understanding of gene structure and expression to predict the impact that a mutation will have on
genome function. The indirect effects are more complex because these relate to the phenotype of
the mutated organism which is often difficult to correlate with the activities of individual genes.
The direct effects of mutations on genomes
Many mutations result in nucleotide sequence changes that have no effect on the functioning of the
genome. These silent mutations include virtually all of those that occur in intergenic DNA and in the
non-coding components of genes and gene-related sequences. In other words, some 98.5% of the
genome can be mutated without significant effect. These are said to be “neutral” mutations.
Mutations in genes: coding regions (exons)
Mutations in the coding regions of genes are much more important. First, we will look at point
mutations that change the sequence of a triplet codon. A mutation of this type will have one of four
effects:
1. It may result in a synonymous change, the new codon specifying the same amino
acid as the unmutated codon. This is because of codon redundancy. A synonymous change
is therefore a silent mutation because it has no effect on the coding function of the genome:
the mutated gene codes for exactly the same protein as the unmutated gene.
2. It may result in a non-synonymous change, the mutation altering the codon so that
it specifies a different amino acid. The protein coded by the mutated gene therefore has a
single amino acid change. This often has no significant effect on the biological activity of the
protein because most proteins can tolerate at least a few amino acid changes without
noticeable effect on their ability to function in the cell, but changes to some amino acids,
such as those at the active site of an enzyme, have a greater impact. A non-synonymous
change is also called a missense mutation.
3. The mutation may convert a codon that specifies an amino acid into a termination
codon. This is a nonsense mutation and it results in a shortened protein because translation
of the mRNA stops at this new termination codon rather than proceeding to the correct
termination codon further downstream. The effect of this on protein activity depends on
how much of the polypeptide is lost: usually the effect is drastic and the protein is non-
functional.
4. The mutation could convert a termination codon into one specifying an amino acid,
resulting in read through of the stop signal so the protein is extended by an additional series
of amino acids at its C terminus. Most proteins can tolerate short extensions without an
effect on function, but longer extensions might interfere with folding of the protein and so
result in reduced activity.

Figure: Effects of point mutations on the coding region of a gene. Four different effects of point mutations are
shown.
Deletion and insertion mutations (indels) also have distinct effects on the coding capabilities of
genes (Figure 14.12). If the number of deleted or inserted nucleotides is three or a multiple of three
then one or more codons are removed or added, the resulting loss or gain of amino acids having
varying effects on the function of the encoded protein. Deletions or insertions of this type are often
inconsequential but will have an impact if, for example, amino acids involved in an enzyme's active
site are lost, or if an insertion disrupts an important secondary structure in the protein. Replication
slippage is responsible for the trinucleotide repeat expansion diseases that have been discovered in
humans in recent years. Each of these neurodegenerative diseases is caused by a relatively short
series of trinucleotide repeats becoming elongated to two or more times its normal length. For
example, the human HD gene contains the sequence 5′-CAG-3′ repeated between 6 and 35 times in
tandem, coding for a series of glutamines in the protein product. In Huntington's disease this repeat
expands to a copy number of 36–121, increasing the length of the polyglutamine tract and resulting
in a dysfunctional protein.
On the other hand, if the number of deleted or inserted nucleotides is not three or a multiple of
three then a frameshift results, all of the codons downstream of the mutation being taken from a
different reading frame from that used in the unmutated gene. This usually has a significant effect
on the protein function, because a greater or lesser part of the mutated polypeptide has a
completely different sequence to the normal polypeptide.
Mutations in genes: outside coding regions (in promoters, introns etc)
It is less easy to make generalizations about the effects of mutations that occur outside of the coding
regions of a gene. In DNA-protein interactions, any protein binding site is susceptible to point,
insertion or deletion mutations that change the identity or relative positioning of nucleotides
involved. These mutations therefore have the potential to inactivate promoters or regulatory
sequences, with predictable consequences for gene expression. Origins of replication could
conceivably be made non-functional by mutations that change, delete or disrupt sequences
recognized by the relevant binding proteins but these possibilities are not well documented.

One area that has been better researched concerns mutations that occur in introns or at intron-exon
boundaries. In these regions, single point mutations will be important if they change nucleotides
involved in the RNA-protein and RNA-RNA interactions that occur during splicing of different types
of intron.
The indirect effects of mutations on organisms
Now we turn to the indirect effects that mutations have on organisms, beginning with multicellular
diploid eukaryotes such as humans. The first issue to consider is the relative importance of the same
mutation in a somatic cell compared with a germ cell. Because somatic cells do not pass copies of
their genomes to the next generation, a somatic cell mutation is important only for the organism in
which it occurs: it has no potential evolutionary impact. In fact, most somatic cell mutations have
no significant effect, even if they result in cell death, because there are many other identical cells in
the same tissue and the loss of one cell is immaterial. An exception is when a mutation causes a
somatic cell to malfunction in a way that is harmful to the organism, for instance by inducing tumor
formation or other cancerous activity.
Mutations in germ cells are more important because they can be transmitted to members of the
next generation and will then be present in all the cells of any individual who inherits the mutation.
Most mutations, including all silent ones and many in coding regions, will still not change the
phenotype of the organism in any significant way. Those that do have an effect can be divided into
two categories:
1. Loss-of-function is the normal result of a mutation that reduces or abolishes a

protein activity. Most loss-of-function mutations are recessive, because in a heterozygote
the second chromosome copy carries an unmutated version of the gene coding for a fully
functional protein whose presence compensates for the effect of the mutation. There are
some exceptions where a loss-of-function mutation is dominant, one example being haplo-
insufficiency, where the organism is unable to tolerate the approximately 50% reduction in
protein activity suffered by the heterozygote.
2. Gain-of-function mutations are much less common. The mutation must be one that
confers an abnormal activity on a protein. Many gain-of-function mutations are in regulatory
sequences rather than in coding regions, and can therefore have a number of consequences.
For example, a mutation might lead to one or more genes being expressed in the wrong
tissues, these tissues gaining functions that they normally lack. Alternatively the mutation
could lead to over-expression of one or more genes involved in control of the cell cycle, thus
leading to uncontrolled cell division and hence to cancer. Because of their nature, gain-of-
function mutations are usually dominant.
Homeotic mutations are an extreme case of gain of function mutations - these mutations
redirect the development of one part of the body into another. They can occur in the genes
that determine the basic “body plan” of an organism. For example, in the fruit fly Drosophila,
legs might develop in place of antennae (antennapdia), or wings in the place of halteres (a
structure that was derived from the second pair of wings in insects).

Figure: A Drosophila antennapedia mutant compared to the wild type
Assessing the effects of mutations on the phenotypes of multicellular organisms can be difficult. Not
all mutations have an immediate impact: some are delayed onset and only confer an altered
phenotype later in the individual's life. Others display non-penetrance in some individuals, never
being expressed even though the individual has a dominant mutation or is a homozygous recessive.
With humans, these factors complicate attempts to map disease-causing mutations by pedigree
analysis (Section 5.2.4) because they introduce uncertainty about which members of a pedigree
carry a mutant allele. The effects of a mutation will also depend on environmental conditions. e.g.:
sickle-cell anaemia.

Recombination
Recombination results in a restructuring of part of a genome, for example by exchange of segments

of homologous chromosomes during meiosis or by transposition of a mobile element from one
position to another within a chromosome or between chromosomes. Recombination is a cellular
process which, like other cellular processes involving DNA (e.g. transcription and replication), is
carried out and regulated by enzymes and other proteins.
Homologous recombination
This is the most important version of recombination in nature, being responsible for meiotic
crossover during meiosis in eukaryotic cells and the integration of transferred DNA into bacterial
(prokaryotic) genomes after conjugation, transduction or transformation.
In eukaryotes, recombination during meiosis is facilitated by chromosomal crossover. The crossover

process leads to offspring having different combinations of genes from those of their parents, and
can occasionally produce new alleles. Chromosomal crossover involves recombination between the
paired chromosomes inherited from each of one's parents, generally occurring during meiosis.
During prophase I the four available chromatids are in tight formation with one another. While in
this formation, homologous sites on two chromatids can closely pair with one another, and may
exchange genetic information.

Because recombination can occur with small probability at any location along chromosome,
the frequency of recombination between two locations depends on the distance separating them.
Therefore, for genes sufficiently distant on the same chromosome the amount of crossover is high
enough to destroy the correlation between alleles.
Linkage
Tracking the movement of genes resulting from crossovers has proven quite useful to geneticists.
Because two genes that are close together (linked) are less likely to become separated than genes
that are farther apart, geneticists can deduce roughly how far apart two genes are (or how tightly
linked they are) on a chromosome if they know the frequency of the crossovers. Geneticists can also
use this method to infer the presence of certain genes. Linked genes therefore typically stay
together during recombination. One gene in a linked pair can sometimes be used as a marker to
deduce the presence of another gene. This is typically used in order to detect the presence of a
disease-causing gene.
Gene duplication
Sometimes crossing over is unequal. One chromosome gets a longer piece of its homologue than the
other chromosome gets in return. This can result in gene duplication in the chromosome that got
more DNA. Gene duplication can give rise to new genes because the extra gene can sustain
mutations while the duplicate gene continues to carry out its normal function. Analyses of the
genomes of many organisms suggest that genes are often duplicated over evolutionary time. The
groups of duplicated genes are referred to as "gene families", owing to the resemblance of their
sequences and their origin by descent from a common ancestor gene.
Gene duplication
1. Gene duplications are either short or long segments of extra chromosome material
originating from duplicated sequences within a genome.
2. This occurs during crossing over, but what
happens is unequal cross-over.
a. Deletion of a substantial amount of genetic
material results in inviable gametes or
zygotes.
b. Duplication of genetic material on the other
chromosome may be advantageous. The
duplicate material is “free” to mutate without
fitness consequences.
3. Gene family - genes that have similar DNA
sequences, but differ to some degree in
sequence and often in function.
4. The globin gene family = two clusters of loci
coding for component subunits of
hemoglobin

The great advantage of recombination
Without recombination, genomes would be relatively static structures, undergoing very little
change. The gradual accumulation of mutations over a long period of time would result in small-
scale alterations in the nucleotide sequence of the genome, but more extensive restructuring, which
is the role of recombination, would not occur, and the evolutionary potential of the genome would
be severely restricted.
Figure: Recombination during meiosis

generates genetic variation in the form of
new combinations of alleles in the
offspring. One homologous pair of
chromosomes is illustrated, starting at the
“four-strand” stage. Each line is a duplex
DNA molecule in a chromatid. The two
chromosomes in the father (inherited from
the paternal grandparents) are blue and
green; the homologous chromosomes in
the mother (inherited from the maternal
grandparents) are brown and pink. All
chromosomes have genes A, B and C;
different numbers refer to different alleles.
In this illustration, a crossover on the short
arm of the chromosome during
development of the male germ cells links
allele 4 of gene C with allele 1 of gene A and allele 2 of gene B, as well as the reciprocal arrangement. A
crossover on the long arm of the chromosome is illustrated for development of the female germ cell, making
the new combination A3, B3 and C1. A child can have the new chromosomes A1B2C4 and A3B3C1. Note that
neither of these combinations was in the father or mother.
The shuffling of genes brought about by genetic recombination produces increased genetic
variation. It also allows sexually reproducing organisms to avoid Muller's ratchet, in which
the genomes of an asexual population accumulate genetic deletions in an irreversible manner.

Other mechanisms that generate Variation
Independent assortment
Mutations occur during DNA replication prior to meiosis. Crossing over during meiosis mixes alleles
from different homologues into new combinations. However a further source of variation comes
from the fact that each chromosome, assorts independently from others. When meiosis is complete,
the resulting eggs or sperm have a mixture of maternal and paternal chromosomes. This is because
during anaphase I, the spindle accurately separates a complete set of 23 human chromosomes into
each daughter cell but does not distinguish between the 23 from Mom and the 23 from Dad. Mom's
and Dad's homologues are randomly intermixed during anaphase such that each egg or sperm cell
has a nearly unique combination of Mom's and Dad's alleles. The number of combinations of 23
maternal and paternal homologues that can result from independent assortment is 223, about 8
million. This does not include the variation caused by mutations or crossing over!
Fertilization
Fertilization randomly brings together two gametes produced in two different individuals. This
means that for a particular man and woman, the number of unique combinations of genes that
could occur in their offspring is 8 million times 8 million (64 trillion), not counting variation caused by
crossing over and mutation. Random fertilization is a further mechanism that produces genetic
variation in the process of sexual reproduction.
The genetic variation that results from mutations, meiosis, and fertilization cause the phenomenon
with which we are all familiar: even in very large populations, such as the human population, every
individual is genetically unique.

Polyploidy
There are additional mechanisms that generate genetic variation. One is polyploidy, which occurs
commonly in plants and leads to new species within one generation. Polyploidy events lead to
organisms with more than two sets of chromosomes. More than half of wild plants are polyploid and
so are many domestic plants such as wheat.
Anueuploid variation – changes in the number of single chromosomes within a set.

Euploid variation- Change in the number of entire sets of chromosomes - polyploidy.
Autopolyploidy - the appearance of extra sets of chromosomes within a species itself.
Allopolyploidy- polyploids that originate from crossing between different species.
1. Polyploidy is common among plants.
a. Majority of polyploidy plants are allopolyploids (from closely related species).
2. Polyploidy is much rarer in animals than in plants, because animals show much greater
developmental sensitivity to even a small change in chromosome number.
Figure: Polyploidy in wheat
Advantages of Polyploidy. Polyploidy probably has some advantages in both plants and animals.
a. Extra chromosomes may act as multiple buffers in various organismic processes.
b. Additional chromosomes may provide the chance to evolve new functions.
c. Duplication of genome offers same possibilities for mutations to produce novel traits
as does gene duplication (it’s just on a larger scale) Rate of

Transposable elements or “jumping genes”
Transposition is not a type of recombination but a process that uses recombination, the end result
being the transfer of a segment of DNA from one position in the genome to another. A characteristic
feature of transposition is that the transferred segment is flanked by a pair of short direct repeats
which are formed during the transposition process.
Integrated transposable elements are flanked by short direct repeat sequences. This particular transposon is
flanked by the tetranucleotide repeat 5′-CTGG-3′. Other transposons have different direct repeat sequences.
The various types of transposable element known in eukaryotes and prokaryotes can be broadly
divided into three categories on the basis of their transposition mechanism.
 DNA transposons that transpose replicatively, the original transposon remaining in
place and a new copy appearing elsewhere in the genome;
 DNA transposons that transpose conservatively, the original transposon moving to a
new site by a cut-and-paste process;
 Retro-elements, all of which transpose via an RNA intermediate. eg. retroviruses,
which include the human immunodeficiency viruses that cause AIDS and various other
virulent types.

GENE CONVERSION
This is a phenomenon that was first described in yeast and fungi but now known to occur with many
eukaryotes. In yeast, fusion of a pair of gametes results in a zygote that gives rise to an ascus
containing four haploid spores whose genotypes can be individually determined. If the gametes have
different alleles at a particular locus then under normal circumstances two of the spores will display
one genotype and two will display the other genotype, but sometimes this expected 2 : 2
segregation pattern is replaced by an unexpected 3 : 1 ratio. This is called gene conversion because
the ratio can only be explained by one of the alleles ‘converting’ from one type to the other,
presumably by recombination during the meiosis that occurs after the gametes have fused.
Gene conversion. One gamete contains allele A and the other contains allele a. These fuse to
produce a zygote that gives rise to four haploid spores, all contained in a single ascus. Normally, two
of the spores will have allele A and two will have allele (more...)

Genetic Markers I
Geneticists rely on markers to detect genetic variation. Markers are simply parts of an organism,
whether it is a physical attribute, a protein or enzyme or a piece of DNA that can be used to compare
individuals to each other. The most important attribute of a marker is that it is polymorphic – that is,
it must have more than one variant or allele. There is no point in examining a marker if it is not
polymorphic – all the individuals will have the same allele and there will be almost nothing that we
can infer. Here we will go through the main markers that have been used by biologists in historical
order.
Morphometrics
Although this is the quantitative analysis of form, a concept that encompasses size and shape, form
is genetically encoded. Morphometric analyses are commonly performed on organisms, and can be
used to quantify a trait of evolutionary significance, and by detecting changes in the shape, deduce
something of their ontogeny, function or evolutionary relationships. A major objective of
morphometrics is to statistically test hypotheses about the factors that affect shape.
Methods:
1. Traditional morphometrics analyzes
lengths, widths, masses, angles, ratios
and areas.
2. In landmark-based geometric
morphometrics, the spatial information
missing from traditional morphometrics
is contained in the data, because the
data are coordinates of landmarks
Advantages
Cheap
Allows comparisons between organisms
Disadvantages
Few characters
Homology in characters is assumed but not known.
Unphased data
DNA-DNA hybridisation
An early and crude method for determining how closely related two organisms were.
Method: The DNA of one organism is labeled, then mixed with the unlabeled DNA to be compared
against. The mixture is incubated to allow DNA strands to dissociate and reanneal, forming hybrid
double-stranded DNA. Hybridized sequences with a high degree of similarity will bind more firmly,
and require more energy to separate them: i.e. they separate when heated at a higher temperature
than dissimilar sequences, a process known as "DNA melting".

Advantages
One of the first molecular methods for determining relationships between organisms.
Disadvantages
Only a rough measure of relatedness is obtained.
Unphased data
Not useful for closely related organisms.
Isozymes
Different molecular forms of an enzyme (protein) that catalyze the same reaction. This was the very
first time that biologists could actually see allelic polymorphisms (that is, whether an individual is
homozygous or heterozygous) directly.
Basis: Non-denatured proteins with different net

charge migrate through a gel at different rates.
Advantages of Isozymes in genetics

1. Phased data – in other words allelic information can
be obtained (eg Homozygotes or Heterozygotes)
2. Simplicity and low cost
Limitations of Isozymes in genetics

1. Few loci available. Although more than 100 isozyme
systems have been described only 40 or 50 are available for a given taxon
2. Unexpectedly high levels of polymorphism for the time, but Low compared to more recent
methods.
3. Variability assessed only at the level of gene product
Restriction fragment length polymorphisms

(RFLPs)
Basis: Differences in restriction patterns between

two related sequences indicative of a modification
in the DNA primary structure.
Advantages of RFLPs:
o Higher level of polymorphism than
isozymes (mitochondrial, nuclear,
chloroplast)
o Larger number of loci
o Selective neutrality
Limitations of RFLP
o Slow and expensive
o Polymorphism is still limited
o Unphased data

Random Amplified Polymorphic DNA (RAPDs)
Basis: Detection of differences in patterns of DNA amplification from short primers of arbitrary
sequence
Method: RAPD PCR
 Denaturation of DNA and annealing of primers
 Primer extension
 Repeat cycling for 20 x
 Electrophorese PCR products
 Stain and score

Advantages of RAPDs:
 More polymorphic than RFLPs
 Simple and quick
 Selective neutrality
Limitations of RAPDs
 Reproducibility among labs may be a problem
 Loci may not be directly comparable
 Unphased data
AFLP: Amplified Fragment Length Polymorphism
Basis: A combination of RFLP and PCR

Method:
DNA digestion
 Ligation of adaptors to ends of restriction fragments
 PCR with primers complementary to adaptors but with 3' overhangs
 Visualization after separation on sequencing gel
Advantages
 highly sensitive
 highly reproducible (repeatable)
 selective neutrality
Disadvantages:
 technically demanding
 expensive
 unphased data

Genetic Markers II
Microsatellites or Simple Tandem Repeats (STRs)
Basis:
 variable number of short tandemly repeated sequences
Method:
 sequence SSR and adjacent DNA
 design PCR primers and conduct a PCR amplification of DNA
 separate
amplified DNA by
electrophoresis
Advantages:
high level of polymorphism
easy and fast to run
robust and reproducible
Phased data - Allelic variation
Disadvantage:
Time-consuming and
expensive to develop
DNA Sequences (Sanger method)
Basis: Differences in nucleotide sequences of specific regions (locus/loci)
Method:
 DNA extraction
 PCR
 DNA sequencing
Advantages
highly reproducible
very informative
Disadvantages
expensive
knowledge of sequence is required for primer design
Amount of polymorphism depend on mutation rate
Unphased data, but can use cloning to determine phase

Genetic Markers III
In the past decade, the development of high-throughput methods for genomic sequencing (next-
generation sequencing: NGS) have revolutionized how many geneticists collect data. It is now
possible to produce so much data so rapidly that simply storing and processing the data poses great
challenges [10].
To some extent the most important opportunity provided by NGS sequencing is simply that we now
have a lot more data to answer the same questions. For example, using a technique like RAD
sequencing [1] or genotyping-by-sequencing (GBS: [2]), it is now possible to identify thousands of
polymorphic SNP markers in non-model organisms, even if you don’t have a reference genome
available.
Restriction site-associated DNA (RAD) sequencing
Method: Digest genomic DNA from each individual with a restriction enzyme, and ligate an adapter
to the resulting fragments. The adapter includes a forward amplification primer, a sequencing
primer and a “barcode” used to identify the individual from which the DNA was extracted. Pool the
individually barcoded samples (“normalizing” the mixture so that roughly equal amounts of DNA
from each individual are present) shear them and select those of a size appropriate for the
sequencing platform you are using. Ligate a second adapter to the sample, where the second
adapter is the reverse complement of the reverse amplification primer. PCR amplification will enrich
only DNA fragments having both the forward and reverse amplification primer. The resulting library
consists of sequences within a relatively small distance from restriction sites.
Advantages
1. Can be used in any organism
from which you can extract
DNA
2. Laboratory manipulations are
relatively straightforward.
3. Huge amount of data
Disadvantages
1. Polymorphisms may not be
selectively neutral
2. The number of loci in
common to all animals in the
population will reduce with
number

Genotyping-by-sequencing (GBS)
Genotyping-by-sequencing (GBS) is similarto RAD seq.
Method: Digest genomic DNA with a restriction enzyme and ligate two adapters to the genomic
fragments. One adapter contains a barcode and the other does not. Pool the samples. PCR amplify
and sequence. Not all ligated fragments will be sequenced because some will contain only one
adapter and some fragments will be too long for the NGS platform.
Advantages
1. Can be used in any organism from which you can extract DNA
2. Laboratory manipulations are relatively straightforward.
3. Huge amount of data
Disadvantages
1. Polymorphisms may not be selectively neutral
2. The number of loci recovered is less than in RadSeq.
Whole genome sequencing (WGS)
This is a laboratory process that determines the complete DNA sequence of an organism's genome
at a single time. This entails sequencing all of an organism's chromosomal DNA as well as DNA
contained in the mitochondria and, for plants, in the chloroplast. During whole genome sequencing,
researchers collect a DNA sample and then determine the identity of the billions of nucleotides that
compose the genome. The very first human genome was completed in 2003 as part of the Human
Genome Project, which was formally started in 1990. Today, sequencing technology is much more
efficient, and a human genome can be sequenced in a matter of days for under $10,000. The first
human genome cost $2.7 billion.

Method: Extract genomic DNA. Fragment the DNA. Prepare a random fragment (shotgun) DNA
library, sequence all the clones in the library. Assemble the sequences.
Advantages
The whole genome of the organism is available for analysis. This is much more than the
reduced representation methods previously used.
Disadvantages
High error rate, requires high coverage for accuracy
Short sequence reads (70-90 bp)
Difficult assembly
Coverage, what is it?

Measuring Genetic Variation
Organisation of genetic variation
In order to measure genetic variation, we first need to figure out what measurement is most
appropriate. This will depend on how the genetic variation is organised. This can be in several ways:
1. It can be organised based on the kind of genetic marker you are dealing with.
A) In eukaryotes for example, some markers can distinguish between the two copies (alleles) of a
locus that you inherited from each of your parents. These are diploid markers and examples include
microsatellite loci, allozymes, RadSeq, GBS and whole genome sequences.
B. On the other hand, other genetic markers

such as RFLP, RAPD, AFLP, Sanger DNA
sequences do not distinguish between the
two alleles at a locus and give you all the
information together – in these cases it can be
difficult to figure out what the two alleles (or
the phase) of the DNA sequences are, unless
you clone the gene.

C. Then there are haploid markers. These are located in parts of the genome, such as the
mitochondrial DNA or chloroplast DNA in plants, that are inherited from only one parent. They are
thus haploid, there is only one allele and so the phase is known – and no cloning needs to be done
to figure out the phase.
2. Genetic variation can also be measured as the number of copies present of a particular locus or
gene. Even in Eukaryotes, many loci are present in more than two copies (mum and dad). Why? This
is because at some point in the past, a gene duplication or gene conversion event occurred and
resulted in multiple copies of the same gene. It can be very useful to know how many copies an
individual has of a particular gene. This is called variation in copy number (VCN).
Telomeres are a classic example. They are sequences of repeated DNA elements that protect the
ends of the chromosome from damage. They are like a cap for the ends of the chromosomes.
However, as we age, these caps get smaller and smaller. So if you look at the VCN in the amount of
telomere DNA in an individual at two different points in time, you can figure out how much they
have aged.
You can also use VCN to determine variation in the

expression of genes. In other words, to see how
much a certain gene is being transcribed in the cell.
This is important because then you can figure out
which genes are important to which cells and under
what conditions. This is very useful in determining
how the regulation of certain genes cause disease,
eg. Cancer. We can figure this out by real time PCR
or by the new technique RNAseq (see previous lectures).

3. Genetic variation
can also be organised
hierarchically from
the level of the
individual gene all the
way to the
population.
A) Individual
measures of genetic
variation
1. Number of alleles
or allele copies (copy
number)
2. Individual heterozygosity. Count the number of loci at which the individual is heterozygous, then
divide by the total number of loci.
B) Population measures of variation
Most commonly genetic variation within a population is measured as the percentage of gene loci
that are polymorphic or the percentage of gene loci in individuals that are heterozygous.
1. The frequencies of an allele at loci are calculated manually by direct counting. The mean number
of alleles (MNA) observed over a range of loci for different populations is considered to be a
reasonable indicator of genetic variation.
2. Allelic richness (Ar). A measure of the number of alleles per locus but allows comparisons to be
made between samples of different sizes.
3. Observed Heterozygosity. This is the average number of observed heterozygotes in each

population.
4. Expected heterozygosity. We can also work out the heterozygosity we would expect given a set of
allele frequencies. If we sum the squares of the allele frequency of each allele at a locus and subtract
it from one. He= 1 – sum(allele frequencies for each allele2)
5. Number of polymorphic sites. This is the number of nucleotide sites that are different in a
population of DNA sequences.
6. Nucleotide diversity. This is the average number of nucleotide substitutions per length of DNA
fragment analysed.
7. Haplotype diversity (or gene diversity). This is a measure of the uniqueness of a particular
haplotype in a given population.

Section 4: The Forces of Evolution
There are only four forces by which evolution operates.
Mutation: occurs randomly. This is what generates the genetic variation upon which the other
evolutionary forces (below) act. See also previous lectures on mutation. Mutations generate
different alleles and the frequencies of these alleles in a population are determined by the three
forces listed below.
Selection: This is the external pressure of the environment that determines which mutations (alleles)
survive to the next generation and which do not. Selection changes the frequency of these alleles
with time. High selection will change allele frequencies faster than low selection pressure.
Genetic Drift: This is a random process in which alleles are lost from a population due to pure
chance. The effect of genetic drift on allele frequencies depends on the size of the population and
the level of migration into that population. Genetic drift can change the frequency of alleles in a
population quickly if it is small and very quickly if it is small AND isolated (no migration to bring in
new alleles). On the other hand the effect of drift is small if the population is large and very small if
the population is large and there is migration from other populations.
Migration: Immigration into a population and emigration out of a population will also change the
allele frequencies of the population, especially if the migrants are breeding in that population.
Migration interacts with both selection and drift.
Chapters 7 and 8, Bergstrom and Dugatkin Evolution (1st Edition)

Section Outcomes.
1. Explain Lamarkism
2. Differentiate between the central tenets of natural selection and evaluate the evidence for
natural selection.
3. Compare the effects of stabilising, directional and diversifying selection on population allele
frequencies
4. Evaluate the contribution of Mendel, the Modern Synthesis and Neutral Theory to the study
of evolution.
5. Explain the four evolutionary forces and how they can drive evolution through changing
allele frequencies in populations.
6. Predict the outcome of crosses including the use of the Punnett square.
7. Explain the cases in which inheritance deviates from Mendelian expectation
8. Explain how sexual selection can change population allele frequencies.
9. Analyze a population using the Hardy-Weinberg calculations. Apply chi square analysis to
those predictions
10. Explain the interaction between selection, genetic drift and migration
11. Explain Dobzhansky’s idea of how changes in gene frequencies could lead to species.
12. Evaluate the neutral theory based on its predictions.

An evolutionary way of thinking
Darwin was not the first person to try and make sense of how the diversity of biological life came to
be. He was preceded by a number of great thinkers. It is very important to realise that the ideas (also
known as hypotheses) put forward by these thinkers may seem obviously wrong by today’s
standards, yet without these ideas, there would be no way to move forward scientifically. Science
and progress, therefore, depend on new ideas, no matter whether they stand the test of time or not.
Evolutionary biology, and science in general, has also undergone its own “evolution”. The major
developments prior to Darwin were:
1. The separation of natural from supernatural explanations. First proposed by Greek

philosophers such as Anaximander (ca. 610–546 b.c.). This sought to leave behind the
supernatural in search of natural explanations for how the world and the life it contains
came to be as it is.
2. Hypothesis testing. If ideas were based on natural explanations, they could be
verified by observing natural processes. The Greek philosopher Aristotle (ca. 384–322 b.c.)
recognized the significance of testing or verifying one’s hypotheses.
“we must not accept a general principle from logic only, but must prove its
application to each fact; for it is in facts that we must seek general principles, and
these must always accord with the facts”
3. The realisation that the world was ever-changing. The Greeks also realised that the
world had changed when they discovered fossilised marine invertebrates embedded in
mountainous rocks, and concluded that these rocks much at one time been at the bottom of
the ocean.
4. The world is very very old. The ancient nature of the Earth only really became fully
appreciated n the late 18th/early 19th century when geologists like James Hutton and Charles
Lyell looked at everyday natural processes, such as soil erosion, and calculated the amount
of time need for these processes to erode large geological features like canyons. Darwin
drew heavily on the work of Lyell, especially the idea that great features can be brought
about through slow and gradual changes, acting over the course of a very long time.
5. The development of Natural history to take stock of the diversity of life. Aristotle
described hundreds of species of animals and developed a taxonomy of nature—a
classification system of life— would later be called “the great chain of being,” or scala
naturae. But people still thought of life as being spontaneously generated, and not evolved
through common ancestry. Erasmus Darwin (Charles Darwin’s grandfather) put forward the
idea of common ancestry stating that all life was decended from a “single living filament”.
Robert Chambers outlined his principle of progressive development, in which he
hypothesized that new species arise from old species.
While the advances above were monumental, they still did not fully explain how in the diversity of
life, organisms have come to be perfectly suited to their environment. The theologian William Paley
attempted to describe the complexity of life as a watch, with all its complex springs and cogs,
perfectly suited for telling the time. He concluded that such complexity could not have come
together simply by chance, and could only have been designed by a watchmaker, in other words, the
complexity of life and its perfect match to its environment could only have been created by god.

However, as charming as the analogy is, the idea fails because it inevitably tried to use a
supernatural explanation instead of a natural one.
Jean-Baptiste Lamarck (1744–1829)
On the other hand, Lamarck used nature to describe his theory of how species change or evolve to
perfectly suite their environments. He postulated that phenotypic characteristics acquired during
the lifetime of an organism can be inherited by its
offspring. The famous example of Lamarck’s theory is that
of the short-necked giraffe. At one point in evolutionary
time, all giraffes had short necks, then to reach the best
leave at the tops of the trees, the giraffes had to stretch
their necks. Those giraffes that stretched their necks the
furthest were the most successful and passed on their
characteristics to the next generation, and after several
generations, all giraffes had long necks. This is Lamarck’s
theory of inheritance of acquired characteristics, also
known as Lamarckism.
Lamarck’s idea was reasonable, he used a natural explanation and did not need to invoke god. But
we know now that acquired characteristics cannot be not inherited, and we now ground our ideas of
how traits are passed from generation to generation in the laws of genetics, which you have learned
in previous sections. These laws, however, were formulated about 100 years after Lamarck.
Lamarck’s legacy, however, is not that he postulated the wrong processes for evolutionary change,
but that he proposed a process in the first place, and that he connected it to environmental fit.
Lamarck is therefore rightfully credited with being the first “Evolutionist”.

Natural Selection
Charles Darwin (1809 - 1882) and Natural Selection
Charles Darwin dropped out of medical school after

only two years! While living in Cambridge, he was
greatly influenced by Henslow, a professor of botany,
and Sedgwick, a professor of geology. Henslow
recommend him to Captain Fitz-Roy to be the "unpaid
naturalist" on the British naval vessel Beagle going to
chart the coast of South America. The voyage took five
years to complete from 1831 to 1836.
Darwin made many observations during the voyage but

his observations on the Galapagos islands were
especially significant. There he saw the evidence of
change in isolated populations. Turtles, mocking birds
and finches were unique to each island even though
they were obviously derived from the same mainland
species. He saw first hand that environments different from the mainland had resulted in modified
turtles and birds. However, the idea of evolution by natural selection did not mature until after his
return. The ornithologist, John Gould, pointed out to Darwin that the mockingbirds which Darwin
had collected from different Galapagos Islands were so distinct from one island to another as to
represent different species. This revelation seems to have led Darwin to doubt the fixity of species
and to set about gathering evidence on the "transmutation of species". The theory of Natural
Selection crystallized on September 28, 1838, 2 years after he returned from the voyage of the
Beagle. He recounts in his autobiography that, " I happened to read for amusement Malthus on
Population, and being well prepared to appreciate the struggle for existence which everywhere goes
from long-continued observation of the habits of animals and plants, it at once struck me that under
these circumstances favourable variations would tend to be preserved and unfavourable ones
destroyed".
He wrote up his theory in an "abstract" in 1844 which he

circulated among a special group but the full manuscript
was set aside with instructions to his wife that it be
published posthumously. He was apparently aware of the
impact it would have and he was unwilling to face the
controversy. He was forced to publish his theory when he
received a manuscript from Alfred Russel Wallace (1823 -
1913), a British naturalist, proposing the same theory
(see below). At the urging of his friends, Charles Lyell and
Joseph Hooker, he prepared an extract from his 1844
manuscript which was presented together with Wallace’s
report, before the Linnean Society on July 1, 1858. Then
in 1859, Darwin published "The Origin of Species by

Means of Natural Selection or The Preservation of Favoured Races in the Struggle for life." The book
sold out its first printing of 1,250 copies in one day and changed the way we think about the world
forever. It remains one of the great intellectual revolutions of human history.
Darwin's tenets of natural selection
1. Variation. Organisms (within populations) exhibit individual variation in appearance

and behaviour. These variations may involve body size, hair color, facial markings, voice
properties, or number of offspring. On the other hand, some traits show little to no
variation among individuals—for example, number of eyes in vertebrates.
2. Inheritance. Some traits are consistently passed on from parent to offspring. Such
traits are heritable, whereas other traits are strongly influenced by environmental conditions
and show weak heritability.
3. High rate of population growth. Most populations have more offspring each year
than local resources can support leading to a struggle for resources. Each generation
experiences substantial mortality.
4. Differential survival and reproduction. Individuals possessing traits well suited for
the struggle for local resources will contribute more offspring to the next generation. These
individuals have thus higher evolutionary fitness than others.
Evolution via natural selection can occur only if there is variation in the population to begin with.
From one generation to the next, the struggle for resources (what Darwin called the “struggle for
existence”) will favour individuals with some variations over others and thereby change the
frequency of traits within the population. This process is natural selection. The traits that confer an
advantage to those individuals who leave more offspring are called adaptations.
In order for natural selection to operate on a trait, the trait must possess heritable variation and
must confer an advantage in the competition for resources. If one of these requirements does not
occur, then the trait does not experience natural selection. (We now know that such traits may
change by other evolutionary mechanisms that have been discovered since Darwin’s time.)
NOTE THAT THIS DOES NOT MEAN THAT EVOLUTION HAS A "GOAL" OR THAT THERE IS A "MOST
HIGHLY EVOLVED SPECIES. AND IT IS NOT AN INEXORABLE MARCH TO AN "IDEAL PINNACLE SPECIES"
(which is most often defined as Homo sapiens by people who haven't a clue about biological
realities...). To use William Paley’s analogy – the watchmaker is “blind” – he has no sense of
direction. See Richard Dawkins’, The Blind Watchmaker.

Evidence for Natural Selection
Fitness can be defined either with respect to a genotype or to a phenotype in a given environment.
In either case, it describes individual reproductive success and is equal to the average contribution
to the gene pool of the next generation that is made by an average individual of the specified
genotype or phenotype. The term "Darwinian fitness" can be used to make clear the distinction
with physical fitness.
Industrial melanism is a phenomenon that affected over 70 species of moths in England. It has been
best studied in the peppered moth, Biston betularia. Prior to 1800, the typical moth of the species
had a light pattern (see Figure 2). Dark colored or melanic moths were rare and were therefore
collectors' items.
During the Industrial Revolution, soot and other

industrial wastes darkened tree trunks and killed off
lichens. Because of this change in fitness between the
two morphs, the light-coloured morph of the moth
became rare and the dark morph became abundant.
In 1819, the first melanic morph was seen; by 1886, it
was far more common -- illustrating rapid
evolutionary change. Eventually light morphs were
common in only a few locales, far from industrial
areas. The cause of this change was thought to be
selective predation by birds, which favoured
Figure. Image of Peppered Moth
camouflage coloration in the moth.
In the 1950's, the biologist Kettlewell did release-recapture experiments using both morphs. A brief
summary of his results are shown below. By observing bird predation from blinds, he could confirm
that conspicuousness of moth greatly influenced the chance it would be eaten.

Recapture Success
light moth dark moth
non-industrial woods 14.6 % 4.7 %
industrial woods 13 % 27.5 %
Galapagos finches are the famous example from Darwin's voyage. Each island of the Galapagos that
Darwin visited had its own kind of finch (14 in all), found nowhere else in the world. Some had beaks
adapted for eating large seeds, others for small seeds, some had parrot-like beaks for feeding on
buds and fruits, and some had slender beaks for feeding on small insects. One used a thorn to probe
for insect larvae in wood, like some woodpeckers do. (Six were ground-dwellers, and eight were tree
finches.) (This diversification into different ecological roles, or niches, is thought to be necessary to
permit the coexistence of multiple species, a topic we will examined in a later lecture.) To Darwin, it
appeared that each was slightly modified from an original colonist, probably the finch on the
mainland of South America, some 600 miles to the east.

Stabilizing, Directional, and Diversifying Selection
We can look at selection in a statistical way. Suppose that each population can be portrayed as a
frequency distribution for some trait -- beak size, for instance. Note again that variation in a trait is
the critical raw material for evolution to occur.
Figures a-c
What will the frequency distribution look like in the next generation?
The blue colour shows us at which part of the frequency distribution selection is operating. First (a),
selection acts against extremes of the frequency distribution (individuals that are too small or too
large). In time, the frequency distribution tends to shift towards the median. This is stabilising
selection, probably the most common form of natural selection, and we often mistake it for no
selection. A real-life example is that of birth weight of
human babies.
Under directional selection, individuals at one end of the

distribution of heights do especially well, and so the
frequency distribution of the trait in the subsequent
generation is shifted from where it was in the parental
generation (see Figure b). This is what we usually think of
as natural selection. Industrial melanism was such an
example. Another example is the fossil lineage of the
horse. line from the tiny, to today's familiar Overall, the
horse has evolved from a small-bodied ancestor, the
dawn horse Hyracotherium of the early Eocene, built for
moving through woodlands and thickets to its long-
legged descendent, Equus, built for speed on the open
grassland. This evolution has involved well- documented
changes in teeth, leg length, and toe structure.

Under diversifying (disruptive) selection, both extremes are favoured at the expense of
intermediate varieties (see Figure c). This is uncommon, but of theoretical interest because it
suggests a mechanism for species formation without geographic isolation.
Artificial selection and domestication
Long before Darwin and Wallace, farmers and breeders were using the idea of selection to cause
major changes in the features of their plants and animals over the course of decades. Farmers and
breeders allowed only the plants and animals with desirable characteristics to reproduce, causing
the evolution of farm stock.
Many species have been modified by breeders, e.g., cattle, sheep, dogs, flowers, vegetables. How is
is done: The offspring of each generation vary. The differences may be so small that only trained
breeder can detect them. Those that are more like what the breeder wants are selected for further
breeding; the rest aren’t allowed to breed. This is repeated. Eventually the small differences add up
to a large change in the breed. Darwin calls this process artificial selection. This works in exactly the
same way as natural selection, except in this case, the environment does not select which individuals
survive and breed to the next generation, instead humans do the selection. An individual’s fitness is
thus determined by the selector. This can create striking differences in a short space of time.
As shown below, farmers have cultivated numerous popular crops from the wild mustard, by
artificially selecting for certain attributes.

All dog breeds are descended from the wolf, but it seems that domestication of the wolf occurred
independently in different parts of Eurasia.

Sexual selection
Sexual selection is a "special case" of natural selection. Sexual selection acts on an organism's ability
to obtain (often by any means necessary!) or successfully copulate with a mate. In order to leave an
evolutionary legacy survival is not enough. Individuals must also reproduce. Over 90% of species
reproduce sexually, meaning two individuals from each sex must mate in order to produce offspring.
Reproduction is expensive and can exert an additional evolutionary pressure. Darwin defined this
pressure as sexual selection. Sexual selection operates through some members of a species having
an advantage over others in terms of mating. It is the selection for traits that are solely concerned
with increasing the mating success of an individual.
Darwin and Sexual Selection
Darwin’s findings in relation to sexual selection were published in his book The Descent of Man and
Selection in Relation to Sex in 1871. Darwin observed that there are some characteristics that do not
appear to help an organism adapt to its environment and are thus not explained by natural
selection. He suggested that they feature in the process of sexual selection. He defined the process
by saying that it ‘depends on the advantage which certain individuals have over other individuals of
the same sex and species, in exclusive relation to reproduction’. His observations and analysis lead
to the reasoning that sexual selection works in two main ways: either through competition among
members of one sex for access to members of the other sex (combat) or through choice by members
of one sex for certain members of the other sex (mate choice).
Combat
In terms of combat, males within a species compete with each other for access to the females. This
leads to larger and stronger males and to the development of male ‘weapons’ in order to give them
the advantage when in combat with other males. Darwin referred to many examples of this.
Elephant seals and walruses are examples of the increased size and strength of males. Elephant seals
annually migrate from their foraging ground to their breeding ground. The males arrive roughly two
weeks before the females and they then fight to gain the best breeding site and thus attract the
most females. Only the largest and strongest males are able to dominate in this competition.
Roughly 90% of males end up
pup-less! Examples of male
‘weapons’ include the horns of
male beetles, the antlers of
stags, the large canine teeth of
male baboons and the tusks of
male wild boar. In addition, male
competition can also be more
subtle. For example, when some
male insects mate with a female
they remove the sperm that is
already present in the female as
it is from previous males.

Display
‘Display’ refers to the exhibition of ornate male features to potential female mates, such as the
striking, brightly coloured plumage of many male birds. Darwin suggested that this process of
selection operates through female choice, whereby females choose the most striking males to mate
with. This theory proposes that male ornaments are thought to be a genuine indication of the male’s
vitality. The presence of a costly ornament on a male tells the female that he is genetically
exceptionally healthy and thus her offspring will inherit his vitality.
A classic example, and one used by Darwin, is the flamboyant tail of the peacock. It appears to serve
no purpose in terms of survival and may actually be a handicap - a disadvantage in terms of survival
for existence - as the tail makes it more difficult for the bird to escape from predators. The
explanation in terms of sexual selection for its presence is that peahens are attracted to it, and there
are various suggestions as to why they like it. Generally sexually selected traits either signal fitness
directly, as their development depends on good health or a good diet, or indirectly when they signal
the ability of the male to survive despite the large cost imposed by the fancy plumage.
Darwin proposed that this process of sexual selection would work in the following way. In the past
when peacocks had ordinary colour
and length tails, peahens showed a
preference to mate with males
with slightly longer and more
flamboyant than average tails.
Thus, the characteristic of slightly
longer, more brightly coloured tails
would be passed on to the next
generation and over many
generations the peacocks' tails
would become longer and brighter.
Thus, the ornate tail gives such an
advantage in terms of mating success that it is selected for despite being a disadvantage in terms of
general survival. Darwin thus argued that these flamboyant male characteristics were not, as
believed at the time, due to a designer who had an aesthetic sense, but due to the need to attract a
mate. Other examples where males are more striking than the females are found in fish, lizards and
spiders.
Darwin’s Evidence For Sexual Selection
Darwin’s main evidence for sexual selection was the fact that he found, from a comparison of a great
number of species, that there is a greater difference between the sexes (greater sexual dimorphism)
in polygynous species than monogamous species. A polygynous species is a species where one male
mates with several females, whereas in monogamous species a single male pairs with a single
female. Darwin reasoned that in polygynous species secondary sexual characters will be more
developed to enable the males to gain access to more females (via combat or display). Therefore,
polygynous species should, and were indeed found to, have greater sexual dimorphism.

Inheritance
What Darwin didn’t know.
Darwin struggled with the problem of inheritance. He knew that traits had to be heritable, that is,
passed on from parent to offspring, otherwise his theory of natural selection would not have
worked. However, he did not know how organisms passed traits on to their offspring. Why did some
traits seem to be passed on and others not? How did the traits of the parents work together in the
offspring - did they compete, or combine?
Darwin eventually proposed an alternative theory of inheritance called Pangenesis. He suggested

that each organ in the body, throughout an individual’s life produces small particles called
‘gemmules’. These ‘gemmules’ contain information about the organ; how it is used or not used, it’s
size, structure etc. When ‘gemmules’ were released from an organ, they travelled through the body
to the sperm and eggs in the reproductive organs, where they stuck together. In this way the
information could be passed on to the next generation, so explaining the heritability of variation.
Traits from both parents were thus blended in the offspring. Darwin was aware that the evidence
was not all there to support his ideas, after all, if traits were blended with each generation, it would
soon lead to a loss of variation. Darwin knew that he may well be wrong with Pangenesis, and he
referred to it as his ‘provisional hypothesis’
“Hypotheses may often be of service to science, when they involve a certain portion of
incompleteness, and even error”.’
August Weismann (1834-1914)
In 1883, just a couple of years after Darwin’s death, the great

German biologist August Weismann proposed the “germ-plasm”
theory of heredity. The germ-plasm theory states that an
organism’s cells are divided into somatic cells (the cells that
make up the body) and germ cells (cells that produce the
gametes). His great insight was to see that the two do not
exchange information – this was Weismann’s Barrier – variation,
the fuel of evolution, must therefore be produced in the germ
cells. Weisman firmly rejected the idea of heritability of acquired
characteristics, which was central to the neo-Lamarkian
evolutionary theories popular in the late 1800s. At this time
natural selection was very unpopular and Weisman was one of
the few supporters, and probably the keenest, of Darwin’s theory
of natural selection.

Johan Gregor Mendel (1822 - 1884)
Mendel is known universally as the “father of genetics.
Gregor Johann Mendel was born in the Austrian Empire in a town that is today in the Czech
Republic. He studied physics and maths at University, but after graduating he began studying to be a
monk, joining the Augustinian order at the St. Thomas Monastery in Brno. In those days the
monastery was a cultural centre, and Mendel was immediately exposed to the research and teaching
of its members, and also gained access to the monastery’s extensive library and experimental
facilities. In 1851, he was sent to the University of Vienna, at the monastery’s expense, to continue
his studies in the sciences. While there, Mendel studied mathematics and physics under Christian
Doppler and botany under Franz Unger. In 1853, upon completing his studies in Vienna, Mendel
returned to the monastery in Brno and was given a teaching position at a secondary school, where
he began the experiments for which he is best known.
Mendel read Darwin with deep interest, but he disagreed with the blending notion - how could a
single fortunate mutation be spread through a species? It would be blended out, just as a single drop
of white paint would be in a gallon of black paint. Instead, Mendel hypothesized that traits, such as
eye colour or height or flower hues, were carried by tiny particles that were inherited whole in the
next generation. This was the birth of particulate inheritance, and these tiny particles eventually
came to be known as genes.
Mendel chose to use peas for his experiments due to their many distinct varieties, and because
offspring could be quickly and easily produced. He cross-fertilized closely related pea plants that had
a small number of clearly opposite characteristics—tall with short, smooth with wrinkled, those
containing green seeds with those containing yellow seeds, etc. The results of Mendel's carefully
designed and meticulously executed experiments, which involved nearly 30,000 pea plants followed
over eight generations.
The seven traits of pea plants that Mendel chose to study: seed wrinkles; seed color; seed-coat color, which
leads to flower color; pod shape; pod color; flower location; and plant height. Image by Mariana Ruiz.
When Mendel bred purple-flowered peas (BB) with white-flowered peas (bb), every plant in the next
generation had only purple flowers (Bb). When these purple-flowered plants (Bb) were bred with
one-another to create a second-generation of plants, some white flowered plants appeared again
(bb). Mendel realized that his purple-flowered plants still held instructions for making white flowers
somewhere inside them. He also found that the number of purple to white was predictable. 75

percent of the second-generation of plants had purple flowers, while 25 percent had white flowers.
He called the purple trait dominant and the white trait recessive.
A Punnett Square. Both of the starting plants have purple flowers

but they contain the genes for purple (B) and white (b). The pollen
from the male plant fertilizes the egg in the female flower. In this
variety of plant, purple flowers are caused by a dominant gene (B).
Dominance is indicated by a capital letter. White flowers are
caused by recessive genes, indicated by the small letter (b). Both
the male and female parent plants in the diagram above carry the
dominant gene B for purple and the recessive gene b for white
flowers. The ratio of purple flowers to white flowers in their
offspring will be 3:1 as shown in this diagram. For a white flower to
appear, the offspring must inherit the recessive gene from both
parents. Purple appears with any other combination of genes
inherited from the parent plants. Image by Madeleine Price Ball
After analyzing his results, reached his most important conclusions:
1. Inheritance is particulate. Each individual therefore has two copies of the

information, or "factors" and these are inherited – one from each parent. These factors are
today known as genes.
2. Dominance/Recessiveness. When two different factors are present in a single
individual, one factor is dominant to the other, which is said to be recessive. In the above
monohybrid cross, a recessive factor is covered up by a dominant factor in the F1 of a cross,
but it reappears in the F2 in a predictable proportion (1/4). The F1 plants all look like one of
the parents (round), but retain the potential to produce both white and purple flowers.
3. Segregation. Each individual receives only one copy of each factor (an allele) from
each parent. Therefore the two copes of the gene first segregate before they are passed on
to the offspring.
4. Independent Assortment. Traits were passed on independently of other traits from
parent to offspring. Therefore, there is no mixing between the genes for different
characteristics; they remain separate entities. He also proposed that this heredity followed
basic statistical laws. Though Mendel’s experiments had been conducted with pea plants, he
put forth the theory that all living things had such traits.
So an adaptive mutation could spread slowly through a species and never be blended out. Darwin's
theory of natural selection, building on small mutations, could work. But no one at the time
understood the implications of Mendel's experiments. He soon left biology to focus on running his
monastery. He died in 1884 having made little impact on the scientific world. It was not until 1900
that Mendel's work was rediscovered, by Carl Correns in Germany, Hugo de Vries in the Netherlands
and Erich von Tschermak-Seysenegg in Austria. Only then did Mendel -- who had worked without a
microscope, without computers, but with a thoughtful hypothesis, a carefully designed experiment,
and enormous patience -- receive the credit for one of the great discoveries in the history of science.

Mendelian Genetics
Important terms
hybrid
F1
F2
phenotype
genotype
gene
alleles
homozygous
heterozygous
test cross
Dihybrid cross
A. Cross

B. Punnett square
WG Wg wG wg
WG
Wg
wG
wg
C. Dihybrid test cross – Consider F1 from previous

example:
What if cross unknown round,
yellow X wrinkled, green and get all
round, yellow?
What if cross unknown round,
yellow X wrinkled, green and get 1/2
round yellow and 1/2 wrinkled, yellow?

Simple Probability
A. Introduction – probability is the likelihood of a particular outcome
eg. 1: probability of rolling 1 with six-sided dice
eg. 2: probability of first child being girl
eg. 3: probability that offspring of heterozygous father will inherit H, the dominant
mutation responsible for Huntington’s chorea.
Some rules of probability
1. limits of probability
a. if event is certain:
b. if event is certain not to occur:
c. if probability of an event is p, the probability of all other outcomes =
eg. What is the probability of not rolling a 6?
Addition rule – mutually exclusive events
eg. 1: What is probability of rolling either 4 or 6?
eg. 2: Cross Ww X Ww. What is probability of round progeny?
Multiplication rule – independent events
eg. 1: Probability of rolling 1, 3 X’s in a row?
eg. 2: Probability that first two offspring from Ww X Ww are round?
eg. 4: Probability that 4th child will be girl if the first 3 children were girls?
egl 5: Probability that first 4 children are girls?

Deviations from Mendelism
Lethal Alleles „
Essential genes are those that are absolutely required for survival. The absence of their protein
product leads to a lethal phenotype. It is estimated that about 1/3 of all genes are essential for
survival.
Nonessential genes are those not absolutely required for survival „
A lethal allele is one that has the potential to cause the death of an organism. These alleles are
typically the result of mutations in essential genes. They are usually inherited in a recessive manner.
Many lethal alleles prevent cell division, these will kill an organism at an early age.
Some lethal allele exert their effect later in life „
Eg. Huntington disease. This is characterized by progressive degeneration of the nervous

system, dementia and early death. The age of onset of the disease is usually between 30 to
50. „
Conditional lethal alleles may kill an organism only when certain environmental conditions prevail
Temperature-sensitive (ts) lethal: A developing Drosophila larva may be killed at 30 C, but it

will survive if grown at 22 C „
Semilethal alleles „
Kill some individuals in a

population, not all of
them, Environmental
factors and other genes
may help prevent the
detrimental effects of
semilethal genes „
A lethal allele may

produce ratios that
seemingly deviate from
Mendelian ratios. Eg.
Creeper in Chickens

Incomplete Dominance (partial dominance):
All genotypes have different phenotypes with the heterozygote intermediate.
Traits that appear to be determined by systems of complete dominance at the gross phenotypic
level may be cases of incomplete dominance at the biochemical level.
Codominance
Sometimes, traits associated with both alleles are observable in a heterozygote. An example in cattle
is the roan coat color (mixed red and white hairs) occurs in the heterozygous (Rr) offspring of red
(RR) and white (rr) homozygotes.
When two roan cattle are crossed, the

phenotypes of the progeny are found to be in
the ratio of 1 red:2 roan:1 white. Which of
the following crosses could produce the
highest percentage of roan cattle?
• A) roan x roan
• B) red x white
• C) white x roan
• D) red x roan
• E) All of the above crosses would give the same percentage of roan.

Sex-Linked Inheritance
Many human traits are determined by X-linked

recessive alleles: Color blindness, Hemophilia,
Muscular Dystrophy (1 form)
Segregation is not the same as for autosomal genes.
In females, segregation for X-linked
genes is the same as for autosomal genes.
Overdominance
Overdominance is the phenomenon in which a heterozygote is more vigorous than both of the
corresponding homozygotes. It is also called heterozygote
advantage or heterosis.
„ Example = Sickle-cell anemia
„ Autosomal recessive disorder
Affected individuals produce abnormal form of hemoglobin
„ Two alleles
HbA Encodes the normal hemoglobin, hemoglobin AHbS

Encodes the abnormal hemoglobin, hemoglobin S
HbSHbS individuals have red blood cells that deform into a sickle shape under conditions of low
oxygen tension. This has two major ramifications „

1. Sickling phenomenon greatly shortens the life span of the red blood cells. Anemia results
2. Odd-shaped cells clump. Partial or complete blocks in capillary circulation. Thus, affected
individuals tend to have a shorter life span than unaffected ones.
The sickle cell allele has been found at a fairly high frequency in parts of Africa where malaria is
found
Malaria is caused by a protozoan, Plasmodium. This parasite undergoes its life cycle in two main
parts. One inside the Anopheles mosquito. The other inside red blood cells. Red blood cells of
heterozygotes, are likely to rupture when infected by Pasmodium sp. This prevents the propagation
of the parasite.
Therefore, HbAHbS individuals are “better” than
HbSHbS, because they do not suffer from sickle cell anemia.
HbAHbA , because they are more resistant to malaria
At the molecular level, overdominance is due to two alleles that produce slightly different proteins
But how can these two protein variants produce a favorable phenotype in the heterozygote?
Well, there are three possible explanations for overdominance at the molecular/cellular level
„ 1. Disease resistance
„ 2. Homodimer formation
„ 3. Variation in functional activity
Pleiotropy
Most genes have multiple phenotypic effects.

The ability of a gene to affect an organism in
many ways is called pleiotropy.
The sickle cell locus is also an example of a

pleiotropic locus.

Epistasis
Epistasis occurs when a gene at one locus

alters or influences the expression of a
gene at a second loci. In this example, C is
for color and the dominate allele must be
present for pigment (colour) to be
expressed.
Polygeny
Qualitative variation usually indicates polygenic

inheritance. This occurs when there is an additive
effect from two or more genes. Pigmentation in
humans is controlled by at least three (3) separately
inherited genes.

Genes and the Environment
Environmental conditions may have a great impact on

the phenotype of the individual
Snapdragon flower colour vs. Temperature and

degree of sunlight
Organelle DNA
Some subcellular organelles in Eukaryotic cells

such as the mitochondria and (in plants)
chloroplasts have their own DNA – that is, over
and above the DNA that exists in the nucleus.
This DNA is only a small portion of the DNA in a

eukaryotic cell; most of the DNA can be found
in the cell nucleus.
In the case of mitochondrial DNA, the DNA can

only be inherited from the mother, in violation
of Mendelian laws. This is called maternal
inheritance. That means, if there are
detrimental mutation in genes in the mother’s
mitochondrial DNA, then they will be inherited
by all daughters.
Mitochondrial DNa, because of its maternal

inheritance and haploid genome, has been very
useful in evolutionary studies. We will come
back to mitochondrial DNA later in the module.

The Modern Synthesis: the birth of Population Genetics
So now we have learned about Darwin’s natural selection, and Mendel’s inheritance. Initially, these
new theories prompted disagreements in the scientific community, and they were first believed to
be contrary to each other. That is because the drastic changes of Mendelian genetics, from round to
wrinkled peas with no intermediate seems to clash with Darwins idea of gradual change over long
periods.
But ultimately these ideas were understood to be complementary: Genes are the means through
which information is inherited, and they are passed on through the germ cells. If an individual’s
genes give it an advantage, it is more likely to survive and pass on those genes to its offspring – that
is, increase the individual’s fitness. This combination of genetics and natural selection is termed
the Modern Synthesis, and is the cornerstone of modern evolutionary understanding.
“Instead of the varied theories of evolution which arose in different branches of biology, we are now
witnessing the emergence of a new science of life united by the great evolutionary idea”
Dobzhansky (1951)
Hardy-Weinberg Equilibrium
Evolution is not only the development of new

species from older ones, as most people
assume. It is also the minor changes within a
species from generation to generation over long
periods of time that can result in the gradual
transition to new species.
The biological sciences now generally

define evolution as being the sum total of the
genetically inherited changes in the individuals who are the members of a population's gene pool. It
is clear that the effects of evolution are felt by individuals, but it is the population as a whole that
actually evolves. Evolution can therefore be explained by an examination of the properties of
individual and populations. This was the essence of the
Evolution is simply a change in frequencies of alleles in the gene pool of a population. For instance,
let us assume that there is a trait that is determined by the inheritance of a gene with two alleles--
B and b. If the parent generation has 92% B and 8% b and the offspring generation have 90% B and
10% b, then you can safely say that evolution has occurred between the generations. The entire
population's gene pool has evolved towards a higher frequency of the b allele--it was not just those
individuals who inherited the b allele who evolved.
In 1908, Godfrey Hardy, an English mathematician, and Wilhelm Weinberg, a German physician
concluded that gene pool frequencies are inherently stable but that evolution should be expected in
all populations virtually all of the time.
Why is evolution expected to be occurring all the time?
Well, consider the opposite question: What has to be true in order for evolution to never occur?
Hardy and Weinberg showed that evolution will not occur in a population if seven conditions are
met:

1. mutation is not occurring
2. natural selection is not occurring
3. the population is infinitely large
4. all members of the population breed
5. all mating is totally random, no sexual selection
6. everyone produces the same number of offspring
7. there is no migration in or out of the population
If these seven conditions are met, the gene pool frequencies in a population will remain unchanged
– evolution would not occur. However, since it is highly unlikely that any even one of these seven
conditions, let alone all of them, will happen in the real world, evolution is the inevitable result.
Hardy and Wilhelm Weinberg went on to develop a simple equation that can be used to discover
the probable genotype frequencies in a population and to track their changes from one generation
to another. This has become known as the Hardy-Weinberg equilibrium equation. In this equation
(p² + 2pq + q² = 1), p is defined as the frequency of the dominant allele and q as the frequency of the
recessive allele for a trait controlled by a pair of alleles (A and a). In other words, p equals all of the
alleles in individuals who are homozygous dominant (AA) and half of the alleles in people who
are heterozygous (Aa) for this trait in a population. In mathematical terms, this is
p = AA + ½Aa
Likewise, q equals all of the alleles in individuals who are homozygous recessive (aa) and the other
half of the alleles in people who are heterozygous (Aa).
q = aa + ½Aa
Because there are only two alleles in this case, the frequency of one plus the frequency of the other
must equal 100%, which is to say
p+q=1
Since this is logically true, then the following must also be correct:
p=1-q
There were only a few short steps from this knowledge for Hardy and Weinberg to realize that the
chances of all possible combinations of alleles occurring randomly is
(p + q)² = 1
or more simply
p² + 2pq + q² = 1
In this equation, p² is the predicted frequency of homozygous dominant (AA) people in a

population, 2pq is the predicted frequency of heterozygous (Aa) people, and q² is
the predicted frequency of homozygous recessive (aa) ones. Another way to think about this is to
imagine that the probability of seeing an aa genotype is the product (multiplication) of the
probability of seeing the a allele twice = q x q = q2

From observations of phenotypes, it is usually only possible to know the frequency of homozygous
recessive individuals, or q² in the equation, since they will not have the dominant trait. Those who
express the trait in their phenotype could be either homozygous dominant (p²) or
heterozygous (2pq). The Hardy-Weinberg equation allows us to predict which ones they are.
Since p = 1 - q and q is known, it is possible to calculate p as well. Knowing p and q, it is a simple
matter to plug these values into the Hardy-Weinberg equation (p² + 2pq + q² = 1). This then
provides the predicted frequencies of all three genotypes for the selected trait within the
population.
To this day, Hardy and Weinberg’s equation is known as the null model for population genetics. This
is because, it will only hold true if evolution DOES NOT occur. That means, if we have a population
that is experiencing one or all of these situations:
1. new mutations occur
2. there is selection acting on mutations
3. The population is finite
4. Not all members of the population breed
5. Breeding is non-random, there is sexual selection
6. Different numbers of offspring are produced
7. There is migration either in or out of the population.
then its gene frequencies will be shifted away from Hardy Weinberg expectation. In other words – it
will be evolving! We use shifts away from Hardy-Weinberg equilibrium to determine if a population
is in fact evolving. Note that this list is the exact opposite from the list of condition for no evolution
to occur.
Figure:
Under Hardy –
Weinberg
Equilibrium, the
frequency of
homozygous
and heterozygous
genotypes will vary in
a very predictable
way with allele
frequencies.

Detecting shifts away from Hardy-Weinberg equilibrium
How do we find out of a population is evolving?
Say we have a population of 40 individuals. Four individuals have the homozygous recessive
genotype (aa), 16 have the homozygous dominant genotype and the rest are heterozygotes. Is the
population in Hardy-Weinberg equilibrium?
Then we use a Chi-square distribution to test the null hypothesis that the population IS in Hardy-
Weinberg equilibrium.

The Modern Synthesis: Selection, Genetic Drift and Migration
The Hardy-Weinberg Equilibrium provided both a "null hypothesis" for genetic evolution and a
mathematical basis for a more comprehensive theory of evolution in which natural selection,
Mendelian genetics, paleontology, and comparative anatomy were combined in what is now known
as the modern evolutionary synthesis. During the 1930s and 1940s, R. A. Fisher, J. B. S. Haldane,
Sewall Wright, and Theodosius Dobzhansky developed mathematical models for fitness, selection,
and other evolutionary processes. These models were then applied to demographic data derived
from artificial and natural populations of organisms in a rigorous (and ongoing) test of the validity of
the neo-darwinian model for genetic evolution. As a result of their work, Darwin's theories of natural
and sexual selection were combined with Mendelian genetics, biometry and statistics, demography,
paleontology, comparative anatomy, botany, and (more recently) molecular genetics and ethology
to produce a "grand unified theory" of the origin and evolution of life on Earth.
Fisher and Natural Selection

Ronald Fisher showed that continuous variation could provide the
basis for natural selection as proposed by Darwin. He also showed
that the rate of change via natural selection was a direct function
of the amount of variation in a population. Thus, the more
variation among alleles that exists in a population, the faster
natural selection can causes changes in the allele frequencies in
that population. And conversely, the less variation among alleles
that exists in a population, the slower natural selection can causes
changes in the allele frequencies in that population.
However, this creates a problem – eventually, natural selection

would result in the "fixation" of the fittest alleles. Any allele that
results in increased survival and reproduction should, if given
enough time, eventually become the only allele for that particular
trait in a particular population.
And with no variation, how can evolution work?
Wright and Genetic Drift
A solution to this problem was provided by Sewall Wright, who discovered a process that has
eventually become known as genetic drift. Wright proposed that in small populations of organisms,
random sampling errors could cause significant changes in allele frequencies in those populations.
He showed mathematically that the smaller a population was, the greater the effect of such
random events on its allele frequencies. In other words, the effect of genetic drift was inversely
proportional to population size.

Wright's discovery of genetic drift solved the problem that Fisher's Fundamental Theorem posed:
how can natural selection be prevented from shutting itself down as the result of fixation? Wright
proposed that allele frequencies could be visualized as forming what he came to call an "adaptive
landscape". In an adaptive landscape, allele frequencies formed a series of hills and valleys, in which
the top of a hill represented the highest an allele frequency could reach via natural selection.
According to Fisher, if an allele is on a slope, it can only go up

the slope via natural selection. But this means if a trait is at the
top of one adaptive peak, it can't go down through a valley to
get to the top of another, even higher (i.e. more adaptive)
peak.
What Wright showed was that an allele

could indeed get from one adaptive peak
to the next - by genetic drift. So, if a
population becomes very small, it is
possible for it to "drift" from one
adaptive peak to another, without sliding
down into the valley (losing fitness) in
between. This means that natural
selection doesn't get "stuck" at the top of an adaptive peak with no variation; populations at one
adaptive peak can make it to another, even higher adaptive peak, so long as they drift randomly to
it.

Migration
Population geneticists also refer to this as gene flow. This simply means the movement of alleles
between populations. Migration in this evolutionary sense can occur by dispersal of adult animal
organisms, seeds and spores of plants, planktonic larvae of intertidal animals, gametes/zygotes of
algae, etc.
Migration / selection equilibrium
Effects of migration on allele frequencies: In the absence of selection (i.e. if alleles are selectively
neutral) migration homogenizes allele frequencies among populations. If selection and migration
tend to increase the frequencies of the same alleles, selection can amplify the effect of migration.
On the other hand, if selection is stronger than migration, then differences among populations will
be maintained, even in the face of migration. And finally, if migration is stronger than selection,
differences among populations will be reduced.
Directional selection can be balanced by influx of 'immigrant' alleles; thus, a stable 'equilibrium' can
be reached if migration rate is constant. However in some cases, migration can hinder optimal
adaptation of a population to local conditions. An example is the water snakes (Natrix sipedon) living
on islands in Lake Erie. Island Natrix mostly unbanded; on adjacent mainland, all banded. Banded
snakes are non-cryptic on limestone islands, eaten by gulls. Unbanded island snakes are not eaten.
But, because of recurrent migration from the mainland, the banded phenotype becomes more
frequent on island, interrupting the directional selection of unbanded snakes.
Migration /drift equilibrium
Wright linked migration and drift in his island model using the equation
Where FST is a measure of genetic differentiation between two populations (we will get to it in the
next section), Ne is the effective population size (the number of breeding individuals) and m is the
migration rate. This formula assumes that migration and drift are always in equilibrium. Thus, the
effect of drift and migration on any population will be the same, but opposite – drift will reduce the
number of alleles in each population while migration between the populations will increase it. There
relative effect of drift and migration will increase or decrease in proportion to the population size,
but they will always be in equilibrium.
However, in some cases, the equilibrium can be disrupted. For example, when one population has
been recently founded by colonists from another population and migration is only in one direction.

The Modern Synthesis: from genes to species
Theodosius Dobzhansky
In 1937, Dobzhansky published these results in a landmark

book, Genetics and the Origin of Species. In it, he sketched
out an explanation for how species actually came into
existence. Mutations crop up naturally all the time. Some
mutations are harmful in certain circumstances, but a
surprising number have no effect one way or the other.
These neutral changes appear in different populations and
linger, creating variability that is far greater than anyone
had previously imagined.
This variability serves as the raw material for making new

species. If the members of a population of flies should
breed among themselves more than with other members
of the species, their genetic profile would diverge. New
mutations would arise in the isolated population, and
natural selection might help them to spread until all the
flies carried them. But because these isolated flies were
only breeding within their own population, the mutations
could not spread to the rest of the species. The isolated
population of flies would become more and more
genetically distinct. Some of their new genes would turn
out to be incompatible with the genes of flies from outside
their own population.
If this isolation lasted long enough, Dobzhansky argued, the flies might lose the ability to interbreed
completely. They might simply become unable to mate with the other flies successfully, or their
hybrid offspring might become sterile. If the flies were now to come out of their isolation, they could
live alongside the other insects but still continue mating only among themselves. A new species
would be born.
Ernst Mayr
Mayr specialized in discovering new species of birds and mapping out their ranges. It was no easy
matter determining exactly which group of birds deserves the title of species. Biologists typically
tried to bring order to this confusion by recognizing subspecies — local populations of a species that
were distinct enough to warrant a special label of their own. But Mayr saw that the subspecies label
was far from a perfect solution. In some cases, subspecies weren't actually distinct from each other,
but graded into each other like colours in a rainbow. In other cases, what looked like a subspecies
might, on further inspection, turn out to be a separate species of its own.
Mayr realized that it was possible to explain the origin of species with genetics. Mayr also realized
that the puzzle of species and subspecies shouldn't be considered a headache: they were actually a
living testimony to the evolutionary process Dobzhansky wrote about. Variations emerge in different
parts of a species' range, creating differences between populations (see example of bird crests
below). In one part of a range the birds may possess long tails, in others, square tails. But because
the birds also mate with their neighbours, they do not become isolated into a species of their own.

The modern synthesis provided a mathematical foundation for evolutionary theory. This literally
meant converting evolution from “natural history” into a modern science. When a hypothesis can be
tested by gathering numerical data (by counting or measuring objects and events), that data can
then be statistically tested to determine if it verifies or falsifies that hypothesis. This is what happens
in the other natural sciences, like chemistry and physics. Since the modern evolutionary synthesis,
this is also what happens in evolutionary biology.
Major tenets of the Modern Synthesis
A. Populations have genetic variation that continuously arises by undirected processes (mutation
and recombination)
B. Populations evolve by changes in gene frequencies through:

1. Genetic drift
2. Gene flow
3. Natural selection
C. Most adaptive variants have slight phenotypic effects, so that phenotypic changes are gradual
D. Diversification arises by speciation (cladogenesis), usually occurring via gradual evolution of

reproductive isolation
E. These processes, continued for a sufficiently long period of time, produce changes sufficient to
delineate higher taxonomic levels

Kimura’s Neutral theory
Before the 1960s there was no data about protein or DNA variation. Remember that DNA and the
genetic code had only been discovered in the 1950s (see Section, The Central Dogma). During this
time, natural selection was believed to be the main driver of evolution, but there were two schools
of population geneticists: the Classical and Balance schools. The Classical school believed that
polymorphisms (the existence of more than one allele in a population of genes) were rare. They
argued that natural selection was a mainly
purifying force that removed any
deleterious alleles that may arise or would
drive any advantageous alleles to fixation.
Therefore, they believed that individuals
were homozygous for most loci.
In contrast, the Balance school believed

that polymorphisms were common.
Polymorphisms at the various loci were
thought to be maintained by different
forms of balancing selection that favoured
heterozygotes over homozygotes. Both
schools of thought agreed that natural
selection was the force driving molecular
evolution.
However, in the mid-1960s the technique of protein electrophoresis was discovered allowing
investigation into the levels of isozyme polymorphism (see Lectures on Genetic Markers). The results
showed that large amounts of genetic variation was present in natural populations, much more
than was expected by the Classical school. This appeared to be a victory for the Balance school.
Segregational Load
However, it was soon discovered
that to maintain these high levels of
polymorphism at thousands of loci
by balancing selection would be
very costly. Summed over multiple
loci this high segregational load
would be large enough to drive
populations to extinction!
Clearly natural selection could not

explain the high levels of protein and DNA polymorphism.
Motoo Kimura
However, the high levels of polymorphism can explained without
encountering excessive genetic
load simply by dropping the assumption that natural selection is the
driving force of molecular
evolution. Instead Kimura allowed the majority of mutations that were
fixed to be neutral and therefore have no effect on fitness. This was a
huge cha nge in perspective from both the Classical and Balance schools,
because the neutral theory stated for the first time that natural selection

may not be the driving force of evolution! Since then the neutral theory has become one of the
most important and controversial theories in evolutionary biology.
Kimura made some simple calculations.
If µ = mutation rate per gene per generation, and Ne = effective population size
Number alleles in a diploid population = 2N

Number of new mutations per generation = 2Nµ
Kimura suggested that genetic drift was a more important evolutionary force than selection. Under
the neutral model, the amount of genetic variation in a population is determined by a balance
between an increase due to mutation rate (= μ) and a decrease due to finite population size
(=genetic drift). All new mutations in a population have the same chance of drifting to fixation. A
mutation can either drift to fixation - when its frequency is 1 (or when all individuals in a population
have that mutation) or it can drift to extinction (when it is lost, frequency =0). Most of the time a
new neutral allele will be quickly lost from the population by genetic drift. But sometimes it will drift
into the population and get fixed, that is, it will replace (or substitute) the original allele in the
population.
The probability that the new allele will drift to fixation = 1/2N (this is equivalent to the
probability of reaching into a bag containing 2N black marbles and pulling out the only red
marble in the bag).
Therefore, the rate of substitution of an allele by a new allele = 2Nµ × 1/2N = µ
Basically meaning that the rate of neutral molecular evolution is independent of population size
and is simply equal to the neutral mutation rate.
The average time for a neutral mutation to drift to fixation is 4N generations.
Therefore, while the rate of origin and fixation of new mutations (µ) is independent of population
size, the rate of progress of the mutation through the population is proportional to the population
size. Therefore, under the neutral theory, polymorphisms in a large population are simply a result of
lots of neutral mutations arising and passing through the population at a slow rate such that at any
one time there are several different alleles at a particular locus drifting through the population.
According to the neutralists, most mutations are either deleterious and are selectively removed, or
are “effectively neutral”, in which case there is a small probability that they are fixed. Natural
selection is incorporated, but as a purifying force, removing deleterious mutations bur with only
a small role in fixing new mutations. As we have seen above, the probability of fixation of a
neutral allele by drift is 1/2N. If this probability is bigger than the selection pressure, the
influence of drift is greater than that of selection and the mutation is effectively neutral. So, the
neutral theory does not argue that most mutations are completely neutral, but that any selection
pressures are outweighed by the effects of drift. The neutral theory conferred a much greater role
for genetic drift in molecular evolution.
Kimura was also able to derive a new formula for Heterozygosity (H) levels under the neutral model.
Here, the most important thing to remember is the
“neutral parameter” 4Neμ
H = 4Neμ/(4Neμ + 1)

Predictions of the Neutral Theory
If the neutral theory is correct, we should see the following two predictions:
1. There is a constant rate, or molecular clock, of sequence evolution
2. There is an inverse relationship between the rate of substitution (µ) and the degree of functional
constraint acting on a gene, such that functionally constrained genes or gene regions evolve at the
lower rate and vice versa.
The molecular clock

A molecular clock is compatible with
the neutral theory, as the rate of
substitution of a neutral mutation is µ,
and is not affected by population sizes
or selective pressures. As long as µ is
constant across species and most
molecular evolution is neutral then the
rate of evolution should be constant
across lineages. At first glance the
evolution of sequences does indeed
appear to be constant over time.
However, on closer inspection, significant variation among lineages becomes evident.
Functional constraints and the rate of substitution

According to the neutral theory most mutations are deleterious and the rest are neutral
(advantageous mutations are very rare). However, genes will differ in the proportion of mutations
that are deleterious. The higher the functional constraint on the gene, the greater is the strength of
negative selection removing mutations. In a gene with high functional constraints the vast
proportion of mutants will be deleterious and be removed by selection, leaving only a small
fraction of neutral mutations which will result in a low rate of substitution. In a less constrained
gene a larger fraction of the mutations will be neutral leading to a higher substitution rate.
Examples of this are
variation in mutation rate between and within genes,
mutation rates in non-coding regions (silent mutations) – pseudogenes, introns.
Synonymous vs. non-synonymous mutations rates (see Lecture on the Effects of Mutations)

Testing the neutrality of mutations using dN/dS:
1) Sequence copies of the gene of interest from a variety of species.
2) Construct a phylogeny of the species using the sequence or other data.
3) Identify synonymous and non-synonymous mutations.
4) Calculate the average synonymous rate of subsititution, dS, the average nonsynonymous
rate of substitution, dN, and the ratio, ω = dN/dS.
We assume that synonymous mutations are neutral. As we have seen, due to functional
constraints, in most genes dN < dS, and ω < 1.
If dN > dS, ω > 1, the coding changes are occurring more rapidly than silent changes. This is
indicative of positive selection to change the amino acid sequence.
Positive selection has been shown in mutation rates within the major histocompatibility complex
(MHC, a family of genes that determine resistance to pathogens) and HIV envelop proteins.
The nearly neutral model

By the early 1970s it was becoming clear that the neutral theory was too simplistic. There was
evidence for positive selection acting on mutations and the molecular clock did not tick at a
perfectly constant rate. This gave rise to the nearly neutral model of molecular evolution.
According to this theory, mutations in non-coding DNA and synonymous sites are still strictly
neutral. However, non-synonymous mutations are no longer regarded as being neutral and are
instead nearly neutral, being either slightly deleterious or slightly advantageous. Therefore, the
nearly neutral model includes weak selection as well as genetic drift.
So who is correct, the neutralists or the selectionists?

It seems that both genetic drift and natural selection determine the evolution of mutations.
Neutralists are probably correct in that most mutations are neutral, especially in non-coding DNA
and synonymous sites. However, evidence of natural selection is sometimes evident at
nonsynoymous sites when molecular evolution over short evolutionary time periods are examined.

Section 5: Genetic Structure
In the previous sections we have looked at the basic molecule of inheritance, how we have been
able to manipulate these molecules, how variation in the form of mutation is generated and the
forces that cause this variation to change with time, to evolve. Now we will look at how this variation
is structured in the living world. We will learn about the history of attempts at partitioning the
variation of life into categories and the details of modern-day methods of assessing genetic
structure.
This section closely follows Chapter 4 of your textbook Evolution 1st Edition by Bergstrom and
Dugatkin.
Chapters 4, 14, Bergstrom and Dugatkin Evolution (1st Edition)
Section Outcomes.
1. Evaluate and manipulate a phylogeny

2. Distinguish between homology and analogy with examples
3. Evaluate and rate the different ways of constructing a phylogeny
4. Explain the principle of an Markov Chain Monte Carlo simulation
5. Describe other methods of partitioning genetic variation
6. Compare phylogenetic trees with phylogenetic networks
7. Compare classical and modern views on speciation
8. Describe the uses of coalescent theory

Making sense of the Variation of life
Categorizing the continuous variation we see in nature is a difficult task. What is the best way
to do it?
Aristotle and the Scala Naturae

The basic idea of a partitioning of the world's organisms
goes back to Aristotle and his biological classification.
This was the basis for a scala naturae which allowed for
an ordering of beings, thus forming a basis for
classification where each kind of mineral, plant and
animal could be slotted into place. It is the first known
attempt at the ordering of life as the ancient Greeks saw
it.
But what was wrong with the scala naturae? Organisms

were ordered in a linear "Ladder of Life", placing them
according to complexity of structure and function so that
higher organisms showed greater vitality and ability to
move. Aristotle's concept of higher and lower organisms
was modified in medieval times, such that the Great
Chain of Life was seen as a God-given ordering: God at
the top, dirt at the bottom, and every grade of creature
in its place. Just as rock never turns to flowers and
worms never turn to lions, humans never turn to angels.
Carolus Linnaeus
In the Northern Renaissance,
however, the scientific focus
shifted to biology. The taxonomic
system was developed by Carolus
Linnaeus (1707–1778), a Swedish
botanist, zoologist, and physician,
who wrote the Systema Naturae.
This taxonomy has proved so very
useful because of Linnaeus’ insight
that organisms can be arranged in
a hierarchical classification.
Linnaeus recognized that not only
can we assign species or
subspecies to groups of highly
similar organisms, we can also
array these groups of similar
species into larger groups of
moderately similar organisms, and
these larger groups can in turn be
categorized into yet larger groups of somewhat similar organisms, and so forth, until we have
accounted for all living things. It is important to note that Linnaeus came to this realization without
having a theoretical basis for why these hierarchical patterns of similarity should exist.

Darwin provided the answer for why these patterns are seen.
He recognized that an evolutionary process of branching
descent with modification would generate nested hierarchies
of similarity as the natural results of phylogenetic history. Not
only did Darwin’s idea of a branching pattern of descent with
modification provide a theoretical foundation for the
hierarchical patterns Linnaeus suggested, but Darwin’s
approach has also led to changes in the classification of many
species, genera, and families.
The German biologist Willi Hennig eventually revisited the problem of taxonomy using Darwin’s
ideas and, in doing so, established the modern approach to classification (Zuckerkandl and Pauling
1962; Hennig 1966). His classic 1966 book—Phylogenetic Systematics—is instructive, because it
emphasizes that, in addition to documenting evolutionary history, phylogenetic trees can help us
classify, or systematize, the world we see around us. We could classify organisms in many ways—for
example, by how large they are, by where they are located, or by their morphology. But in
phylogenetic systematics, we classify organisms according to their evolutionary histories—and
phylogenetic trees are our way of representing these evolutionary relationships. But how do we
build phylogenegtic trees?
Traits
The study of phylogeny rests on our observations of traits displayed by organisms. You are already
familiar with th concept of a trait from previous lectures, but formally, a trait can be any observable
characteristic of organisms; for example traits may be anatomical features, developmental or
embryological processes, behavioural patterns, or genetic sequences. Until the major advances in
molecular genetics that occurred in the 1970s, almost all trait measurements used in the study of
phylogeny were morphological or anatomical—bone length, tooth shape, and so on. With the
advent of molecular genetics, actual DNA sequences are now the most common traits used to
reconstruct phylogenies of extant organisms.
Reading Phylogenies
(4.2 of your textbook Evolution 1st Edition by Bergstrom and Dugatkin.)
The top or root of a phylogeny is the first point in time captured by the phylogeny. All subsequent
branching patterns occurred more recently in time until the last branched (or leaves) of the tree
which usually signify the present day.
See also:
Rotating Phylogenies
Clades and monoplyletic groups, sister taxa
Rooted and unrooted trees
Branch Lengths

Homology and Analogy
When we use traits, whether its morphological features like bones or beaks or DNA sequences, to
infer common ancestry and thereby build phylogenies, we order them based on their similarity. You
will be forgiven for thinking that the more similar two traits are the more likely they are to have a
recent common ancestor, and likewise the more dissimilar two traits are the more likely they are to
not share a recent common ancestor –
not so!
Some similarities do derive from

common ancestry, but others derive
from similar selection pressures on
organisms that do not share recent
common ancestry. So if we want to
reconstruct an accurate phylogenetic
tree that reflects common ancestry, we
have to be very careful that we choose
to examine traits based only on
homology.
A homologous trait is a trait that is found

in two or more species because those
species have inherited this trait from an
ancestor. All female mammals produce
milk for their young, and they all possess
this homologous trait because mammals
share a common ancestor that produced
milk. Similarly, all vertebrates have a
vertebral column because the common
ancestor to vertebrates had a vertebral
column (or something like it).
In contrast to homologous traits, analogous traits are shared by two or

more species, not because of a history of common descent, but instead
because some other evolutionary process, usually natural selection,
has independently fashioned similar traits in each species. Many of the
shared adaptations for desert living that we examined in Chapter 3 are
analogous traits. Figure 4.21 illustrates phylogenies that contain
homologous and analogous traits.
Recognize that, when considered by itself, a given trait of a single

species cannot be said to be homologous or analogous. These terms
refer to the comparison between a trait of one organism and a similar
trait of another. For example, wings are homologous if we are making

a comparison between eagles and ducks, but they are analogous if we are making a comparison
between eagles and dragonflies. Both homologous and analogous traits are used as evidence for
Darwin’s theory of evolution by natural selection—but they are typically used as evidence for
different parts of his theory. The presence of homologous traits indicates that species have a shared
ancestry, and thus supports Darwin’s thesis that all organisms have descended from one or at most a
few common ancestors. The presence of analogous traits reveals that natural selection generates
structurally or functionally similar solutions to similar problems, often many times in parallel. This
provides support for Darwin’s thesis that the process of natural selection leads to organisms that are
well adapted to their environments—and that natural selection can act as a creative force in
generating these adaptations.
Divergence and Convergence

A discussion of homology and analogy leads
necessarily to the concepts of divergent and
convergent evolution. Divergent evolution occurs
when closely related populations or closely related
species diverge from one another because natural
selection operates differently on each of them, eg
beak structure in finches (left). Convergent evolution
occurs when two or more populations or species
become more similar to one another because they
are exposed to similar selective conditions—that is,
convergent evolution leads to analogous traits in
whatever populations or species we are examining. In
the classic case, the fusiform (hydrodynamic)

shape of sharks is very similar to that of extinct
ichthyosaurs and that of dolphins. Yet they all
three have very different common ancestors
(Right). This reflects the high selection pressure
for being able to move quickly underwater.

Phylogenetic reconstruction I
More important terms in phylogenetics

apomorphy -- specialized (=derived) characters of an organism.
basal group -- The earliest diverging group within a clade; for instance, to hypothesize that sponges
are basal animals is to suggest that the lineage(s) leading to sponges diverged from the lineage that
gave rise to all other animals.
character -- Heritable trait possessed by an organism.
character state -- characters are usually described in terms of their states, for example: "hair
present" vs. "hair absent," where "hair" is the character, and "present" and "absent" are its states.
clade -- A monophyletic taxon; a group of organisms which includes the most recent common
ancestor of all of its members and all of the descendants of that most recent common ancestor.
From the Greek word "klados", meaning branch or twig.
ingroup -- In a cladistic analysis, the set of taxa which are hypothesized to be more closely related to
each other than any are to the outgroup.
lineage -- Any continuous line of descent; any series of organisms connected by reproduction by
parent of offspring.
hypothesis -- A concept or idea that can be falsified by various scientific methods.
derived -- Describes a character state that is present in one or more subclades, but not all, of a clade
under consideration. A derived character state is inferred to be a modified version of the primitive
condition of that character, and to have arisen later in the evolution of the clade. For example,
"presence of hair" is a primitive character state for all mammals, whereas the "hairlessness" of
whales is a derived state for one subclade within the Mammalia.
monophyletic -- Term applied to a group of organisms which includes the most recent common
ancestor of all of its members and all of the descendants of that most recent common ancestor. A
monophyletic group is called a clade.
outgroup -- In a cladistic analysis, any taxon used to help resolve the polarity of characters, and
which is hypothesized to be less closely related to each of the taxa under consideration than any are
to each other.
paraphyletic -- Term applied to a group of organisms which includes the most recent common
ancestor of all of its members, but not all of the descendants of that most recent common ancestor.
plesiomorphy -- A primitive character state for the taxa under consideration.

polyphyletic -- Term applied to a group of organisms which does not include the most recent
common ancestor of those organisms; the ancestor does not possess the character shared by
members of the group.
reticulation -- Joining of separate lineages on a phylogenetic tree, generally through hybridization or
through lateral gene transfer. Fairly common in certain land plant clades; reticulation is thought to
be rare among metazoans.
sister group -- The two clades resulting from the splitting of a single lineage.
taxon -- Any named group of organisms, not necessarily a clade.
sympleisiomorphy – A ancestral character shared by the taxa under consideration
synapomorphy -- A character which is derived, and because it is shared by the taxa under
consideration, is used to infer common ancestry (shared derived state).
A phylogenetic tree represents a hypothesis for evolutionary relationships. There are several ways of
reconstructing a phylogenetic hypothesis. Lets examine the main ways.
Parsimony
Also known as cladistics. The fundamental idea behind parsimony is that the best phylogeny is the
one that both explains the observed character data and requires the fewest evolutionary changes –
exactly the same as the Occams Razor principle – the theory that has the fewest assumptions is the
correct one. In parsimony, traits are referred to as characters, and they are able to change their
state eg. if the homologous trait “fur” is a character, then “white” and “dark” fur colour are
character states. The best or most parsimonious tree is the one with the fewest character state
changes.
Advantages
Simplicity
Disdvantages
Not statistically consistent. The tree produced does not always have the highest likelihood.
Slow
A most parsimonious tree might never be found, depends on the quality of the data.
Distance matrix methods

Distance methods were originally applied to phenetic data using a matrix of pairwise distances.
These distances are then reconciled to produce a phylogenetic tree The distance matrix can come
from a number of different sources, including measured distance (for example from immunological
studies) or morphometric analysis, various pairwise distance formulae (such as euclidean distance)
applied to discrete morphological characters, or genetic distance from a DNA sequence alignment,
restriction fragments, or allozyme data. For phylogenetic character data, raw distance values can be
calculated by simply counting the number of pairwise differences in character states. Several simple
algorithms exist to construct a tree directly from pairwise distances,including UPGMA and neighbour
joining (NJ), but these will not necessarily produce the best tree for the data. UPGMA assumes
an ultrametric tree (a tree where all the path-lengths from the root to the tips are
equal). Neighbour-joining is a form of star decomposition, and can very quickly produce reasonable
trees. It is very often used on its own, and in fact quite frequently produces reasonable trees.
Advantages
Extremely fast
Produces a reasonable estimate of phylogeny.
Disadvantages
The relationship between individual characters and the tree is lost in the process of reducing
characters to distances.
Some complex phylogenetic relationships may produce biased distances.

Phylogenetic reconstruction II
Maximum likelihood (ML)

Maximum Likelihood is a method for the inference of phylogeny. Uses derived states of characters
or quantitative characters to construct a tree based on the probabilities of character states changing
on the tree. The probability of change is estimated from the data. It evaluates a hypothesis
(phylogeny) about evolutionary history in terms of the probability that the proposed model of
evolution (substitution model) and the hypothesized history (phylogeny) would give rise to the
observed data set. The supposition is that a phylogeny with a higher probability of reaching the
observed state is preferred to a history with a lower probability. The method searches for the tree
with the highest probability or likelihood.
Advantages
Often lower variance than other methods
they tend to be robust to many violations of the assumptions in the evolutionary model
even with very short sequences they tend to outperform alternative methods such as
parsimony or distance methods.
the method is statistically well founded
evaluates different tree topologies
uses all the sequence information
Disadvantages
maximum likelihood is very CPU intensive and thus extremely slow
the result is dependent on the model of evolution used
Bayesian inference
Similar to maximum likelihood in that it uses the same evolutionary models, but otherwise very
different. Bayesian phylogenetic analysis uses Bayes' theorem, which relates the posterior
probability of a tree to the likelihood of data, and the prior probability of the tree and model of
evolution. Unlike parsimony and likelihood methods, Bayesian analysis does not produce a single
tree or set of equally optimal trees. Bayesian analysis uses the likelihood of trees in a Markov chain
Monte Carlo (MCMC) simulation to sample trees in proportion to their likelihood, thereby producing
a credible sample of trees.
Advantages
Faster than maximum likelihood
Can combine different kinds of data (eg. molecular and morphological).
Offers the possibility of setting priors based on previously obtained data
Disadvantages
Priors if not chosen appropriately could ntroduce bias
Convergence of the Markov Chain can be difficult to assess
Models of nucleotide substitution

These models describe the evolution of DNA as a string of four discrete states. They describe the
relative rates of different changes. Mutational biases and purifying selection favouring conservative
changes are probably both responsible for the relatively high rate of transitions compared
to transversions in evolving sequences. Models such as the Kimura (K80) model attempts to capture
the effect of both forces in a parameter that reflects the relative rate of transitions to transversions.
These models are often used for analyzing the evolution of an entire locus by making the simplifying
assumption that different sites evolve independently and are identically distributed. This assumption
may be justifiable if the sites can be assumed to be evolving neutrally. But if the primary effect of
natural selection on the evolution of the sequences is to constrain some sites, then models of
among-site rate-heterogeneity (gamma) can be used. The most commonly used models are the
Jukes-Cantor (JC), Kimura80, F81, HKY and GTR.

Markov Chain Monte Carlo (MCMC) – what is it?
Put very simply, it is a very neat way to heuristically solve (in other words approximate) extremely
complex problems which require a very lengthy computation. Such problems occur fairly often in
genetics, so it is best to get the idea of how a MCMC simulation works – once and for all. In the
Bayesian phylogenetic reconstruction, calculating the posterior probability of the tree space involves
a summation over all trees (the analysis generates millions of them) and, for each tree, integration
over all possible combinations of substitution model parameter values and branch lengths. MCMC
provides a shortcut to this problem. How?
MCMC methods can be described in three steps: first using a stochastic mechanism a new state for
the Markov chain is proposed. Secondly, the probability that this new state is correct is calculated.
Thirdly, a new random variable (0,1) is proposed. If this new values is less than the acceptance
probability the new state is accepted and the state of the chain is updated. In this way the MCMS
avoids looking for the solution to the problem is all the possible probability space, and concentrates
instead on this areas where the probability is higher than where it is at the moment. This can be
illustrated by the robot walking up the hill below. He is prevented from walking down the hill and
immediately goes back to the top and tries to get to a higher point.
This process is run for either thousands or millions of times until the simulation has reached a
stationary distribution, when it has converged at the maximum value. The amount of time a single
tree is visited during the course of the chain is just a valid approximation of its posterior probability.
Testing the validity of a phylogenetic tree
One approach to assessing how well a tree represents all of the data is to resample the data
repeatedly and reperform the phylogenetic analysis to see how often the same result is obtained
from these resampled (and nonidentical) datasets. Resampling can be done by bootstrapping in
which the characters (e.g., alignment columns) are resampled with replacement, or by jackknifing, in
which the characters are resampled without replacement. Frequently, 100 or 1000 of these new
resampled datasets are generated and a phylogenetic tree is built from each of them. The new trees
are then compared to determine in what fraction of the trees particular evolutionary groupings are
found. It is very important to realize that these tests do not determine how accurate a tree is, just
how well it reflects the underlying data.

Other ways of measuring Genetic Structure
Phylogenetic networks
A phylogenetic network or reticulation is any graph used to visualize evolutionary relationships

(either abstractly or explicitly) between nucleotide sequences, genes, chromosomes, genomes,
or species. They are employed when reticulate events such as hybridization, horizontal gene
transfer, recombination, or gene duplication and loss are believed to be involved. They differ
from phylogenetic trees by the explicit modeling, by means of the addition of hybrid nodes (nodes
with two parents) instead of only tree nodes (nodes with only one parent). Phylogenetic trees are a
subset of phylogenetic networks. Many kinds and subclasses of phylogenetic networks have been
defined based on the biological phenomenon they represent or which data they are built from
(hybridization networks,
usually built from rooted
trees, recombination
networks from binary
sequences,median
networks from a set of splits,
optimal realizations and
reticulograms from
a distance matrix), or
restrictions to get
computationally tractable
problems (galled trees, and
their generalizations level-k
phylogenetic networks, tree-
child or tree-sibling
phylogenetic networks).
Cluster algorithms
These are commonly used in genetic studies. They typically use multilocus data from genetic
markers including SNPS, microsatellites, RFLPs and AFLPs. Their uses include inferring the presence
of distinct populations, assigning individuals to populations, studying hybrid zones, identifying
migrants and admixed individuals, and estimating population allele frequencies in situations where
many individuals are migrants or admixed.

Principle component analysis (PCA)
This is a statistical procedure that uses an orthogonal transformation to convert a set of

observations of possibly correlated variables into a set of values of linearly uncorrelated variables
called principal components. The number of principal components is less than or equal to the
number of original variables. This transformation is defined in such a way that the first principal
component has the largest possible variance (that is, accounts for as much of the variability in the
data as possible), and each succeeding component in turn has the highest variance possible under
the constraint that it is orthogonal to the preceding components. This is very useful to depict data
where genetic structure is very weak, eg in the below example of European ethnicities..
Wright’s FST
The fixation index (FST) is probably the most reported measure of population genetic
differentiation due to genetic structure. It is frequently estimated from genetic polymorphism data,
such as single-nucleotide polymorphisms (SNP) or microsatellites. It was developed by Sewall
Wright. The most commonly used definition for FST at a given locus is based on the variance of allele
frequencies between populations
If is the average frequency of an allele in the total population, is the variance in the frequency
of the allele between different subpopulations, weighted by the sizes of the subpopulations,
and is the variance of the allelic state in the total population, the FST is defined as
This definition illustrates that FST measures the amount of genetic variance that can be explained by
population structure. This can also be thought of as the fraction of total diversity that is not a
consequence of the average diversity within subpopulations, where diversity is measured by the
probability that two randomly selected alleles are different, namely .

However, Wright’S FST can also be defined as the decline in heterozygosity due to subdivision within
a population. In gthis cae, the F statistic is a measure of the difference between the mean
heterozygosity among subdivisions in a population, and the potential frequency of heterozygotes if
all members of the population mix freely and non-assortatively This comparison of genetic variability
within and between populations is frequently used in applied population genetics. The values of FST
range from 0 to 1. A zero value implies complete panmixis – no differentiation; that is, that the two
populations are interbreeding freely. A value of one implies that all genetic variation is explained by
the population structure, and that the two populations do not share any genetic diversity.
Hierarchical genetic variation (AMOVA)
Analysis of molecular variance, is a statistical model for the molecular variation in a single species.
The method was developed by Laurent Excoffier, Peter Smouse and Joseph Quattro at Rutgers
University in 1992. One can break populations down into three hierarchical levels: within demes (S),
within groups of demes (G) and the total population (T). Given this structure, one can break Wright’s
F statistics, which are comparisons of allele frequencies between two entities, into three
components: among demes within group (FSG), among groups within the total population (FGT) and
among demes within the total population (FST).
However, molecular data reveals not only the frequency of molecular markers, but can also tell us
something about the amount of mutational differences between different genes. A technique that
could be used to estimate population differentiation by analyzing differences between molecular
sequences rather than assumed Mendelian gene frequencies would therefore be very useful. Thus
AMOVA breaks down the variance components of these three hierarchical levels into фSG
фGT and фST.

Speciation: the classical view
Speciation is the evolutionary process by which new biological species arise.
What is a species?
Before we can examine speciation, we first have to ask ourselves: what is a species?
Many biologists have tried to answer that question previously. As a result we have around 26
different ideas of what is a species out to be - from the Morphological species concept of Linnaeus,
to the Biological species concept of Dobzhansky and the Evolutionary Species concept of Simpson.
Many have come close, but all have come short.
Why?
Because the continuity of evolution makes a species impossible to define. For every attempt at a
definition, there are some annoying exceptions to the rule. But yet, we know that they exist – we see
them everywhere we look. We ourselves are a different species to chimpanzees and gorillas, even
though we are closely related.
Modes of Speciation
How do new species originate? In other words, how can we understand the origin of species—the
question that occupied, indeed tormented, Darwin for so many years? All around us we see an
astonishing array of different life-forms. How could such a diversity of different species come to be?
We examine different models of speciation—allopatric, peripatric, parapatric, and sympatric

speciation. These three models are distinguished from one another by the relative geographic
positions of populations undergoing speciation.
Allopatric speciation
is speciation that occurs when
biological populations of the same
species are separated by geography
from each other to an extent that
prevents or interferes with genetic
interchange. This can be the result
of population dispersal leading
to emigration, or by vicariant changes
such as mountain formation, island
formation. The isolated populations
then undergo genotypic and/or
phenotypic divergence as: (a) they
become subjected to
dissimilar selective pressures; (b) they
independently undergo genetic drift;
(c) different mutations arise in the
two populations. When the populations come back into contact, they have evolved such that they
are reproductively isolated and are no longer capable of exchanging genes. Island genetics is the
term associated with the tendency of small, isolated genetic pools to produce unusual traits.
Examples include insular dwarfism and the radical changes among certain famous island chains, for
example on Komodo and the Galápagos islands.

Peripatric Speciation
This is similar to allopatric speciation in that populations are isolated and prevented from
exchanging genes. However, peripatric speciation, unlike allopatric speciation, proposes that one of
the populations is much smaller than the other. The evolution of the polar bear from the brown
bear is a well-documented example of a living species that gave rise to another living species
through the evolution of a population located at the margin of the ancestral species' range.
Peripatric speciation was originally proposed by Ernst Mayr, and is related to the founder effect,
because small living populations may undergo selection bottlenecks. Therefore, genetic drift is often
proposed to play a significant role in
peripatric speciation.
Parapatric Speciation
In the parapatric speciation model, the
population of a species constitutes one
or more biogeographically distinct
subpopulations with a small,
continuous overlap or minimal contact
zone between populations. This
minimal contact zone may be the result
of unequal dispersal or distribution,
incomplete geographical barriers, or
divergent expressions of animal
behaviour, among other things. A
parapatric population distribution may
result in nonrandom mating and
unequal gene flow, which can then
produce an increase in
the dimorphism between populations.
In parapatric speciation, there is an
intrinsic barrier of nonrandom mating and distinct selection pressures that create unequal gene
flow. Parapatric speciation is the culmination of this unequal gene flow effect, in
which genotypic dimorphism between populations results in speciation of the population and
redefines the population as two or more sister species.
Sympatric speciation
This is the process through which new species evolve from a single ancestral species while inhabiting
the same geographic region. The mechanisms for sympatric speciation are unclear. Disruptive
selection has often been invoked, but rarely demonstrated. Horsehoe bats are thought to speciate
sympatrically by changing their calls. Cichlid fish in central Africa may have speciated sympatrically
via sexual selection..
Reproductive Isolating Mechanisms (Intrinsic Mechanisms)

It is clear that, most species will develop reproductive isolation mechanisms that will eventually
make it impossible for them exchange genes with each other. How long these mechanisms take to
evolve is a matter of debate. Reproductive isolating mechanisms are any heritable features of body
form, functioning, or behaviour that prevent gene flow between genetically divergent populations.
Prezygotic Isolation (mechanisms take effect before or during fertilization)
Ecogeographic Isolation: populations may become so specialized for different
environmental conditions that they cannot survive under the others conditions.

Habitat Isolation: potential mates may be in the same general area by not in the
same habitat where they are likely to meet (for example: different species of
manzanita shrubs live at different altitudes and habitats).
Temporal (Seasonal) Isolation:different groups may not be reproductively mature at

the same season, or month, or year (for example: periodical cicadas).
Behavioral Isolation: patterns of courtship may be altered to the extent that sexual
union is not achieved (for example: albatross courtship rituals).
Mechanical Isolation: two populations are mechanically isolated when differences in

reproductive organs prevent successful interbreeding (for example: floral
arrangements in sage plants discriminate between different bee pollinators).
Gametic Isolation: incompatibilities between egg and sperm prevent fertilization or

the sperm is killed outright by the female reproductive tract.
Postzygotic Isolation (mechanisms take effect after fertilization)

Devevelopmental Isolation: Fertilization may occurs, but development of the embryo
is irregular and development is not completed.
Hybrid Inviability: Fertilization does occur between different species, but the hybrid
embryo is weak and dies.
Hybrid Sterility: In some instances the hybrids are vigorous but sterile (example:
mule produced by a male donkey and a female horse).
Post-zygotic isolating mechanisms may appear first since the two populationas are recently diverged
and my still be similar. However, these are costly in terms of time, energy, lost reproductive
opportuniuties, and fitness. Therefore, selection should ultimately operate for pre-zygotic
mechanisms.
Most species are separated from one-another by more than one pre-zygotic mechanism.

Speciation: the modern view
Hybrid speciation
This is a form of speciation wherein hybridization between two different species leads to a new
species, reproductively isolated from the parent species. This is not new in plants, we know that
many plants are polyploidy, and this must have arisen through hybridization of two parental species.
Because plants are able to double or even quadruple their chromosome number, they are able to
take advantage of adaptive benefits that polyploidy can confer. On the other hand, examples of
hybrid speciation in animals, homoploid hybridisation as it does not result in a change in
chromosome number, were rare until recently. Ernst Mayr believed that hybridisation had little to
no role to play in animal speciation because the fitness of hybrid phenotypes would almost certainly
be less than that of either of the two parental phenotype. Among the few known examples are
insects and fish. However, the great skua has a surprising genetic similarity to the physically
dissimilar pomarine skua, and most ornithologists now assume the great skua is a hybrid species
between the pomarine skua and one of the northern skua species. Mayr’s view on hybrid speciation
looks set to change in the new era of genomics, where we are able to look into all the genetic
material and unlock all its secrets.
Adaptive introgression
A classic example of adaptive introgression are a genus of South American Heliconius butterflies. In
this case, one species has gained an adaptive advantage in mimicking the colour pattern of an
inedible non-sister taxon. This is conclusive evidence that different animal species can hybridise and
that hybrid fitness can be greater than that of the parents.

Adaptive radiations
Adaptive radiations are

characterised by the rapid
evolution of distinct
phenotypically differentiated
taxa. This may be due to a
sudden climate change, eg the
glacial periods of the Plei-
Pleistocene is known to have
given rise to the large diversity in
bovid species. During this same
period, as grasslands in Africa
increased in size, also gave rise
to the hominid radiation that we
are a part of. From evidence of
ancient hybridisation events in
bovid species as well as genomic
evidence for hybridisation
between humans and
Neanderthal as well as Denisova
man in Europe and Africa
respectively.
What is it about adaptive radiations that promote adaptive introgression?
Part of the answer may lie in reproductive isolation, which must by definition be incomplete if
introgression can occur. Perhaps the rapid evolution that occurs during an adaptive radiation is
enough to evolve distinct phenotypes (species) but not for the evolution of complete reproductive
isolation. It may also be that rapidly evolving species suffer from losses in genetic variation, and
could thus benefit from an influx of hybrid genetic diversity. Whatever the case, we will discover it in
the next few years...

Coalescence
The coalescent process is a powerful modeling tool for population genetics. The allelic states of all
homologous gene copies in a population are determined by their own genealogical and mutational
history. Its history can be traced back to the Wright-Fisher model - a very simple mathematical
model of the evolution of a population. It says that we have a set of discrete non-overlapping
generations, where each new generation is sampled from the previous by sampling at random with
replacement.
So you start out with a set of of N individuals in the first generation and then you create the next
generation by Ntimes selecting a parent
from the first population at random, and
copy him/her to the next generation.
For the next generation you do the same,

but this time you sample from the second
generation (the one you just created)…
…and you continue this process for as many

generations as you need.
Now according to coalescent theory, i can

take a sample of individuals t the very
bottom of this figure (in the present) and
relate the genetic diversity in this sample to
demographic history of the population from
which it was taken.
How?
By working backwards in time from the

present. It is possible to model of the effect
of genetic drift backwards in time from the
homologous gene copies sampled today, all
the way back to the most recent common
ancestor of that gene sequence – the MRCA.
The inheritance relationships among alleles
are typically represented as a gene
genealogy, or gene tree, similar in form to
a phylogenetic tree. The probabilistic
expectation of this gene genealogy is also
known as the coalescent. Understanding the
statistical properties of the coalescent under
different assumptions forms the basis of coalescent theory. Because of recombination, different
gene loci follow different pathways of ancestry, resulting in different gene genealogies. The
coalescent is also relevant to phylogenetics, as incomplete lineage sorting between speciation
events results in conflict among gene-loci in phylogenetic relationships inferred among species.

In the simplest case, coalescent theory assumes only genetic drift and no recombination, no natural
selection, and no gene flow or population structure. The gene genealogy is independent of the
mutational process, such that changes in the DNA sequence do not affect inheritance and can be
considered separately (even if all gene copies are identical in sequence they are not equally related
in the gene tree). Under this model, the probability at any given time of an allele becoming fixed is
just its frequency p in the population at that time. For a diploid population of size N and (neutral)
mutation rate μ, the initial frequency of a novel mutation is simply and the number of new
mutations per generation is 2Nμ.the expected time between successive coalescence events, by
which two gene copies arise from a single ancestral copy, increases almost exponentially back in
time (with wide variance). Advances in coalescent theory include recombination, selection,
migration, population growth rate, and virtually any arbitrarily complex evolutionary or demographic
model in population genetic analysis. This makes the coalescence framework extremely useful not
just in working out the time at which the most recent common ancestor existed (TMRCA), but also in
modelling how the different evolutionary forces operate and even interact.
Gene genealogies and species trees
If you consider the figure below, which shows the coalescent process for a single gene going
backwards in time for three phenotypically different species A, B and C. Alleles are coloured
differently to show when they mutated from one form to the other. Now if you sampled members of
each species today, and sequenced that gene, you would find that your gene genealogy placed them
in reciprocally monophyletic clades, corresponding to three distinct species. You would be very
happy and you would assume that the gene genealogy you produced was a true reflection of the
true species tree – which is the big fat tree that enclosed the gene genealogy. However, if you had
sampled them 5 generations into the past, you would find that your gene genealogy was
paraphyletic – although these look phenotypically like different species, your tree would not
separate them into monophyletic species clades. This is because of a phenomenon called incomplete
lineage sorting (or trans species polymprphism).
The problem is that in most cases, you will not arrive to sample your three species after the lineages
have sorted. This is because all species are in a different state of their evolution and it is just a
matter of chance what you discover by sequencing just one gene. Many studies that use only
mitochondrial DNA to infer evolutionary history have this problem. This is why, to obtain an estimate
of the true species tree, you have to sequence several different genes, and then perform coalescent
modelling to obtain the most likely species tree.

Molecular Ecology
Molecular ecology is the field of biology that merges genetics with ecology. It brings together the
theoretical approaches of population genetics and phylogenetics, which we have learned about in
previous lectures, with the more applied fields of phylogeography and conservation genetics – which
we will examine in this section.
Molecular Ecology has now come of age and is primed to become a science in its own right. The
technological advances in the last two decades have seen an unprecedented rise in molecular
ecological studies. Importantly, recent advances such as next generation sequencing, multiprocessor
computing, sophisticated analytical software, bioinformatics and GPS tracking now make it possible
to study wild populations in unprecedented detail.
There is no additional ready for this section.
Section Outcomes.
1. Explain how a fluctuating palaeoclimate can structure genetic variation with examples
2. Evaluate the out-of-Africa model for human migration in light of recent evidence from next
generation sequencing
3. Identify the reasons for conserving biodiversity
4. Demonstrate the effect of a bottleneck on the genetic diversity of a population
5. Describe the consequences of losing genetic diversity
6. Evaluate the concept of the evolutionary significant unit, why we need them and how best
to define them.
7. Explain with examples why taxonomic issues are important in conservation
8. Conservation genetics with conservation genomics

Phylogeography
Phylogeography is the study of the historical processes that may be responsible for the
contemporary geographic distributions of individuals. This is accomplished by considering the
geographic distribution of individuals in light of the patterns associated with a gene genealogy.
This term was introduced to describe geographically structured genetic signals within and among
species. An explicit focus on a species' biogeography/biogeographical past sets phylogeography
apart from classical population genetics and phylogenetics.
Past events that can be inferred include population expansion, population bottlenecks, vicariance
and migration. As you know, these events all have associated consequences because of genetic drift,
migration and selection. Recently developed approaches integrating coalescent theory or the
genealogical history of alleles and distributional information can more accurately address the
relative roles of these different historical forces in shaping current patterns.
The term phylogeography was first used by John Avise in his 1987. Early phylogeographic work has
recently been criticized for its narrative nature and lack of statistical rigor (i.e. it did not statistically
test alternative hypotheses). The only real method was Alan Templeton's Nested Clade Analysis,
which made use of an inference key to determine the validity of a given process in explaining the
concordance between geographic distance and genetic relatedness. Recent approaches have taken a
stronger statistical approach to phylogeography than was done initially.
Genetic signatures of Palaeoclimate
Palaeoclimatic change, such as the glaciation cycles of

the Pleistocene, the period spanning the past 2.4 million
years, has periodically restricted some species into
disjunct refugia. These restricted ranges may result in
population bottlenecks that reduce genetic variation.
Once a reversal in climate change allows for rapid
migration out of refugial areas, these species spread
rapidly into newly available habitat. A number of
empirical studies find genetic signatures of both animal
and plant species that support this scenario

of refugia and postglacial expansion. Our
own species, and its history in Europe is a
classic example of expansion from glacial
refugia in southern Europe. The influence of
palaeoclimate has occurred both in the
tropics (where the main effect of glaciation
is increasing aridity, i.e. the expansion of savanna and retraction of tropical rainforest as well as
temperate regions that were directly influenced by glaciers.

Comparative phylogeography
The field of comparative phylogeography seeks to explain the mechanisms responsible for the
phylogenetic relationships and distribution of different species. For example, comparisons across
multiple taxa can clarify the histories of biogeographical regions. For example, phylogeographic
analyses of terrestrial vertebrates on the Baja California peninsula and marine fish on both the
Pacific and gulf sides of the peninsula display genetic signatures that suggest a vicariance event
affected multiple taxa during the Pleistocene or Pliocene.
The figures below map out the phylogeographic history of poison frogs in South America.
Phylogeography integrates biogeography and genetics to study in greater detail the lineal history of
a species in context of the geoclimatic history of the planet. An example study of poison frogs living
in the South American neotropics (illustrated to the left) is used to demonstrate how
phylogeographers combine genetics and paleogeography to piece together the ecological history of
organisms in their environments. Several major geoclimatic events have greatly influenced the
biogeographic distribution of organisms in this area, including the isolation and reconnection of
South America, the uplift of the Andes, an extensive Amazonian floodbasin system during the
Miocene, the formation of Orinoco and Amazon drainages, and dry−wet climate cycles throughout
the Pliocene to Pleistocene epochs. Using this contextual paleogeographic information
(paleogeographic time series is shown in panels A-D) the authors of this study proposed a null-
hypothesis that assumes no spatial structure and two alternative hypothesis involving dispersal and
other biogeographic constraints (hypothesis are shown in panels E-G, listed as SMO, SM1, and SM2).
The phylogeographers visited the ranges of each frog species to obtain tissue samples for genetic
analysis; researchers can also obtain tissue samples from museum collections.
The evolutionary history and relations among different poison frog species is reconstructed using
phylogenetic trees derived from molecular data. The molecular trees are mapped in relation to
paleogeographic history of the region for a complete phylogeographic study. The tree shown in the

center of the figure has its branch lengths calibrated to a molecular clock, with the geological time
bar shown at the bottom. The same phylogenetic tree is duplicated four more times to show where
each lineage is distributed and is found (illustrated in the inset maps below). The combination of
techniques used in this study exemplifies more generally how phylogeographic studies proceed and
test for patterns of common influence. Paleogeographic data establishes geological time records for
historical events that explain the branching patterns in the molecular trees. This study rejected the
null model and found that the origin for all extant Amazonian poison frog species primarily stem
from fourteen lineages that dispersed into their respective areas after the Miocene flood basin
receded. Regionally based phylogeographic studies of this type are repeated for different species as
a means of independent testing. Broadly concordant and repeated patterns among species in most
regions of the planet are due to a common influence of palaeoclimatic history.
Human phylogeography
Phylogeography has also proven to be useful in understanding the origin and dispersal patterns of
our own species, Homo sapiens. Based primarily on observations of skeletal remains of ancient
human remains and estimations of their age, anthropologists proposed two competing hypotheses
about human origins.
The first hypothesis is referred to as the

Out-of-Africa with replacement model
(or simply the Out-of-Africa model),
which contends that the last expansion
out of Africa around 60,000 years ago
resulted in the modern humans
displacing all previous Homo spp.
populations in Eurasia that were the
result of an earlier wave of emigration
out of Africa. On the other hand, the
multiregional scenario claims that
individuals from the recent expansion
out of Africa intermingled genetically
with those human populations of more
ancient African emigrations. A
phylogeographic study that uncovered a
Mitochondrial Eve that lived in Africa
150,000 years ago provided early support for the Out-of-Africa model.
While this study had its shortcomings, it received significant attention both within scientific circles
and a wider audience. A more thorough phylogeographic analysis that used ten different genes
instead of a single mitochondrial marker indicates that at least two major expansions out of Africa
after the initial range extension of Homo erectus played an important role shaping the modern
human gene pool and that recurrent genetic exchange is pervasive. These findings strongly
demonstrated Africa's central role in the evolution of modern humans, but also indicated that the
multiregional model had some validity. These studies have largely been supplanted by population
genomic studies that use orders of magnitude more data.
However, in light of these recent data from the 1000 genomes project, genomic-scale SNP databases
sampling thousands of individuals globally and samples taken from two non-Homo sapiens hominins
(Neanderthals and Denisovans), the picture of human evolutionary has become more resolved and
complex involving possible Neanderthal and Denisovan admixture, admixture with archaic African
hominins, and Eurasian expansion into the Australasian region that predates the standard out of
African expansion.

Conservation genetics
In his famous book, Frankham defined conservation genetics as “the application of genetics to
preserve species as dynamic entities capable of coping with environmental change.”
Nevertheless, Conservation genetics is a large and rapidly growing field of biology. It covers a
wide range of topics: inbreeding depression, loss of genetic diversity, reduced gene flow, genetic
drift, accumulation of deleterious alleles, genetic adaptation to captivity, resolution of taxonomic
uncertainties, definition of management units, forensic application, understanding species
biology and outbreeding depression.
Why would we want to preserve biodiversity anyway?
Genes: Wild animals and plants are sources of genes for hybridization and genetic engineering.
Food sources: Animals, plants, mushrooms, etc.
Natural products: medicines, fertilizers, and pesticides we use are derived from plants and animals.
We also get products such as oils, adhesives, and silk from natural sources.
Environmental services: We rely on plants and animals for important processes such as soil aeration,
fertilization, and pollination.
Scientific interest: The diversity of plants and animals inspires scientific inquiry in many different
realms. Evolutionary science, anatomy, physiology, behaviour, and ecology are only a few examples.
Self-perpetuation: Biologically diverse ecosystems help to preserve their component species,
reducing the need for future conservation efforts targeting endangered species.
Conservation genetics is about more than endangered species
It includes understanding the relationships and diversity which represent biodiversity need not
(directly) be an applied science, but can address issues relating to understanding diversity may assist
planning viable conservation strategies more than conserving directly.
Major questions in conservation genetics
What (genetic) diversity is present within our taxon of interest?
What diversity is present within a region?
What does this tell us about the important processes for creating and maintaining diversity?
Can we predict the consequences of a particular diversity level?
If so, what is the appropriate response?
What/where should we be saving?
Can we have objective criteria for prioritization?
Is human activity human activity reducing diversity or is it a natural process?
Is genetic pollution genetic pollution a risk of human movements and introductions?

Losing genetic diversity
Frankham’s view appears excessively optimistic given the challenges we face in the modern world.
Habitat loss is a massive problem. In many cases 95% or more of habitats have already been
destroyed by man, and yet the human population continues to grow, requiring ever more resources.
As a result of this, wild animal population numbers have declined by more than 50% since 1970. In
many cases the decline is over 90%.
Bottlenecks and Genetic Drift

A bottleneck, or demographic bottleneck, occurs when
a normally large population (or worse, a species) is
rapidly reduced to very small numbers due to floods,
fire, disease or human factors. Bottlenecks reduce the
variation in the gene pool of a population; thereafter,
a smaller population with a correspondingly smaller
genetic diversity, remains to pass on genes to future
generations of offspring through sexual reproduction.
The effect of a bottleneck on genetic diversity depends
on the severity of the bottleneck - the more severe
the more variation will be lost. This is similar to what
happens during a founder effect, when only a limited
number of individuals found a new population.
However, equally if not more important is the duration of a bottleneck, if the bottleneck is brief the
loss of diversity will not be as great as if the bottleneck was sustained over several generations. This
is because, in addition to the initial loss
from the bottleneck, genetic drift
continues to act upon the small population,
further reducing genetic diversity with each
generation. A population remaining at size
NE for t generations will lose
heterozygosity according to:
Ht = H0(1-1/2NE) (where Ht =
heterozygosity after t generations; H0 =
initial heterozygosity; NE = effective
population size). Another thing to notice in
this equation is the use of NE. NE, the
effective population size is often
much smaller than the actual population
size. Thus, if the population recovers rapidly, then genetic drift will not have as good a chance of
randomly purging genetic diversity.
I either case, genetic diversity will generally remain lower, only slowly increasing with time as
random mutations or gene flow from other populations occurs. In consequence of such population
size reductions and the loss of genetic variation, the robustness of the population is reduced and its
ability to survive selecting environmental changes, like climate change or a shift in available
resources, is reduced.
Inbreeding
Inbreeding is the mating of individuals that are related by descent. Offspring resulting from such

matings often show reduced
fitness compared to non-inbred
individuals, a phenomenon known
as inbreeding depression. This
reduced fitness is thought to arise
mainly as a result of deleterious
recessive alleles coming into
homozygous combination in inbred
individuals. Although the
harmful effects of inbreeding have
been known for some time (Darwin
documented evidence for
it), most examples come from laboratory experiments. There are very few good examples of
inbreeding depression in wild populations primarily because it is difficult to know how inbred a
wild individual is.
Is inbreeding depression of concern in organisms of

conservation interest? Incestuous matings
are unlikely to arise in large populations, however, when
populations become very small,
incestuous matings are unavoidable and inbreeding
depression may occur. Due to space
limitations, small population sizes are often unavoidable in
zoos and the first good evidence for
inbreeding depression in organisms of conservation interest
came from captive breeding
programs (Siberian tiger, left). Probably the best example of
inbreeding depression in a wild population is that of the
Mandarte Island song sparrow.
The population inbreeding co-efficient (FIS) can be calculated from co-dominant genetic data such as
microsatellites. It gives you an idea of the proportion of homozygous individuals in a population.
FIS = (1 - Freq of heterozyzotes)/the frequency of heterozygotes
An FIS > 0 denotes heterozygote deficiency, whereas FIS < 0 denotes heterozygote excess. One can
also use FIS to determine if population allele frequencies meet Hardy-Weinberg expectation.
Low genetic diversity

While an inbred individual has a low heterozygosity (as measured using molecular markers such
as allozymes or microsatellites), a population with a low heterozygosity (or genetic diversity) is
not necessarily a sign of inbreeding. This is because genetic drift in a small population can cause
the loss of alleles resulting in low heterozygosity in the absence of incestuous matings. High
heterozygosity is not strictly the same as high genetic diversity. But a population with a high
genetic diversity has lots of alleles and will generally have a high heterozygosity.
It is usually assumed that a population with a lower genetic diversity will be less fit compared to
a population with a higher genetic diversity. This is because a population with a low genetic
diversity is in theory less adaptable to environmental changes. For example, a population with a
low genetic diversity may lack alleles that confer resistance to particular diseases and therefore
be susceptible to the disease in question. The assumption that population with a low genetic
diversity is less fit has given rise to the idea of an “extinction vortex”. But evidence for the
lowered fitness of populations with low genetic diversity is equivocal. E.g. Glanville fritillary,
wolf, Florida panther, cheetah.

Conservation Units
Since conservation depends on politicians and money, we need to define units of biodiversity that
need to be conserved, because then, we can try to lobby politicians spending money on a defined,
discrete entity. A major question is: should we be conserving species? As you have learned already,
“species” only is a very simplistic approach in the complexity of the real world. Species are not really
discrete entities. And if we decide to conserve species, we may ask: Is the biodiversity only the
number of species present? And: are all species equally relevant for biodiversity? And: Must
biodiversity be determined entirely by our species definition?
Units for conservation within species

This seems even more difficult and controversial than defining species! Species clearly require
management as separate units, but some populations within species may be on the path to
speciation and show significant adaptive differentiation to particular ecological niches or significant
genetic differentiation, justifying their management as separate evolutionary lineages for
conservation. Phylogeography has helped in the prioritization of areas of high value for
conservation.
The Evolutionary Significant Unit (ESUs)
These are genetically differentiated populations that have a high priority for separate management
and conservation
closely related (sometime synonymous) to
subspecies
distinct population segments (DPS) - Endangered Species Act (USA)
Many authors suggest that ESUs, subspecies and DPS all merit separate management and have a
high priority for conservation. The fundamental idea is that conservation should aim to preserve
evolutionary processes and adaptive potential, not just current species without regard to losing
significant variation within species. Because if variation within a species is lost, the species loses its
adaptive potential.
But it is difficult to define an ESU. What is genetically differentiated mean? Originally, it was defined
by reproductively isolated and then by ecological distinctness. In 1994 Craig Moritz proposed a
totally genetic definition:
ESU’s should show

• significant divergence and reciprocal monophyly for mitochondrial (mt)DNA
• significant divergence of allele frequencies at nuclear
loci
Moritz's definition implies

• both historical and recent restriction of gene flow,
making criteria for genetic distinctiveness more
concrete
• evidence for long term divergence that is continuing
in the present, since divergence in mtDNA reflects long
term restriction of gene flow and the congruence of
slow and fast markers (microsatellites) is evidence for
historical isolation that persists today.

Management units
Moritz also suggested that if mtDNA monophyly could not be demonstrated, but if only
microsatellites showed significant allele frequency differentiation, then the populations could be
considered separate management units (MUs), with a lower priority for conservation than ESUs.
The Moritz criteria have been used in dozens of cases to define genetic conservation units. A recent
study on imperiled cave crayfish in the Appalachian Mountains of eastern North America
demonstrates how phylogenetic analyses along with geographic distribution can aid in recognizing
conservation priorities. Using mtDNA, the authors found that hidden within what was thought to be
a single, widely distributed species, an ancient and previously undetected species was also present.
Conservation decisions can now be made to ensure that both lineages received protection. Results
like this are not an uncommon outcome from phylogeographic studies.
An analysis of salamanders of the genus Eurycea, also in the Appalachians, found that the current
taxonomy of the group greatly underestimated species level diversity. The authors of this study also
found that patterns of phylogeographic diversity were more associated with historical (rather than
modern) drainage connections, indicating that major shifts in the drainage patterns of the region
played an important role in the generation of diversity of these salamanders. A thorough
understanding of genetic structure will thus allow informed choices in prioritizing areas for
conservation.
Criticisms to Moritz’s ESU definition
• evolutionary significant units are essentially equivalent to phylogenetic species

• genetically defined ESUs ignore adaptive differences
• significant divergence is unlikely to be detected within species with high gene flow even though
populations may have adaptive differences and warrant separate management
• populations with low gene flow that have been differentiated by genetic drift may be designated
as separate ESUs even though they may not be adaptively distinct.
Elemental conservation units (ECUs)

This is a more recent attempt at a more holistic definition of a conservation unit, by taking into
account some of the criticisms of the Moritz criteria. The main features of the ECU deal with
demographic isolation, genetic exchangeability and ecological exchangeability. If a population is
demographically isolated, then there is no gene flow into or out of it. It is therefore unable to
replenish the losses of genetic diversity that will be caused by genetic drift. It is thus unable to
recover from losses in genetic diversity by itself. In addition, two populations can be genetically
exchangeable if they are only recently diverged, but they can only be ecologically exchangeable if
their habitats differ significantly.
Unfortunately, although potentially very useful for conservation management, the ECU scheme has
as yet never been implemented. This is because it requires a lot of non-genetic data about the
populations involved, which is not available for many species, including high profile endangered
species.

Taxonomic issues
Taxonomy is very important for determining what is out there to be conserved. While some groups
of organisms, such as birds and mammals, have received much attention in this respect, the
relationships between taxa in other groups, especially invertebrates and bacteria are less well
known.
As conservation decisions are very often based around conservation units such as species,
subspecies, ESUs, DPSs, etc, it is very important that the taxonomic status of a population is correctly
assigned. However, these units on which conservation decisions are based are difficult to define, as
shown in the last lecture. In addition, subspecies are often accorded legal protection in many
countries including South Africa. The subspecies concept is even more subjective and controversial
than the species concept. The difficulty lying in the fact that there are no sharp boundaries between
what represents a species, a subspecies and mere subpopulations.
Two subspecies can possibly be viewed as being two populations part of the way towards full
speciation. Sequences of conserved genes, such as sections of the mitochondrial genome, are widely
used to determine species and subspecies status. The sequences are used to build phylogenetic
trees as previously described. When discrimination between more closely related groups (between
populations) is required, faster evolving bits of the genome, such as microsatellites, are often used.
Incorrect taxonomy
But how large does genetic divergence have to be before we are dealing with “good species”?
Although genetic data exist for many species, the vast majority of species (and subspecies) have
been defined only by taxonomists, often dealing
with small sample sizes (in museums) with limited or
patchy geographic coverage of the species range.
For most species, their legal status is still based on
taxonomic classifications. Serious problems can
arise when management decisions are taken based
on incorrect taxonomic classification.
- Endangered species may be denied legal

protection and conservation resources, e.g. in the
case of the tuatara. It was originally not known that
the tuatara was the only surviving species of a
whole reptilian order.
- Resources may be wasted on conserving

populations of common species or hybrids, e.g.
Florida panther, or the red wolf.
- Population augmentation programs transferring

organisms between populations thought to be the
same species (or subspecies) may result in
unwanted hybridization. An example of this is the
extralimital introduction of Zululand black rhino
across the Zambezi river, which was found by
genetics to be a barrier to gene flow, in Zambia and
Malawi.

- Populations that could be used to improve the fitness of inbred populations may be overlooked,
e.g. Dusky seaside sparrow.
Wildlife forensics
The utility of genetic markers for taxonomic purposes has also been exploited for a number of
forensic applications such as the tracking of rare or elusive animals, or the identification of species
from bits of tissue. The RhoDIS database of African rhinoceros, held at the University of Pretoria is a
good example. Here, they are building a genetic data base of all living rhino in Africa. With these
microsatellite genetic profiles, they are able to identify rhino products such as confiscated horns,
down to the individual level. This evidence has often been used to convict poachers in courts of law.
Forensic applications are primarily a result of our ability to amplify tiny amounts of DNA using the
polymerase chain reaction. The small amount of DNA contained in hair shed by animals or even
faeces (though this is a smelly and messy business) is often sufficient. The tissue samples used in
forensic applications are often highly degraded and contain miniscule amounts of DNA. For this
reason, mitochondrial genes are often amplified from such tissue samples as more copies of mtDNA
are present per cell compared to nuclear DNA. The Pyrennean brown bear and the hairy-nosed
wombat are cases where hair and/or faecal samples have been used for tracking purposes. The
identification of protected whale species in commercially sold food items is another example where
genetic tools have been used successfully for conservation purposes.

Conservation Genomics
It is becoming increasingly obvious that the identification of species or populations for conservation
prioritisation depends on the balance between neutral and adaptive divergence. Populations that
are more highly distinct, both in terms of adaptive and neutral variation, are more highly prioritised
for conservation effort.
Conservation genomics incorporates the latest technological advances of the genomics revolution.
Genomics approaches, which have been revolutionising all fields of biology recently, can offer
important insights into a number of challenges faced by conservation biology such as identification
of functionally important genomic variation and an improved understanding of the mechanisms
behind important conservation genetic processes such as inbreeding depression.
Recently, new technical developments have opened the way to ask and answer new questions. The
invention of next generation sequencing (NGS) techniques enables the collection of genome-wide
information on genetic variation. NGS also facilitates genomic studies of non-model species that
lack data on the genome and transcriptome. This revolutionises the field of conservation genetics in
the following ways:
1. Applying NGS techniques will give estimates of genetic variation across the entire genome, instead
of estimates of variation based on a limited set of markers.
2. Information on variation in thousands of single nucleotide polymorphism (SNP) markers allows a
population genomic approach, which enables signals of selection and adaptation to be identified.
SNP markers associated with selection can be investigated in small populations, which may lead to
evaluations of the balance between genetic drift and selection.
3. NGS allows the study of gene expression rather than the study of sequence variation.
Transcriptomic studies will aid in identifying genes of adaptive importance, and will help
considerably in investigating the mechanisms of processes that are important in a conservation
genetic context (such as inbreeding depression and local adaptation).

Future challenges
The integration of genomic approaches into conservation genetics is still in its infancy. Both technical
and methodological incorporation is emerging in the literature. However, the expansion of
conservation genetics towards conservation genomics requires attention to certain issues:
Much of the advance is technology-driven and expensive
Development of necessary training in data analysis.
Genomics knowledge and facilities are very unevenly distributed across countries


Guide to Molecular Genetics Course ZOO3649

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Guide to Molecular Genetics Course ZOO3649

Hochgeladen von

Copyright:

Verfügbare Formate

Course Guide Molecular Genetics (Evolutionary Genetics) ZOO3649

Compiled by course coordinator:

Professor Yoshan Moodley

Office FF-047, Life Sciences and Chemistry Building.

Course assistant: Khomotso Nkadimeng*

Office FF-047, Life Sciences and Chemistry Building.

*Please direct correspondence for this module to the course assistant.

1 Course Guide – Molecular Genetics – ZOO3649 – 2015 – University of Venda

The prerequisite to attend this course is a pass in ZOO2544.

1. Bergstrom and Dugatkin Evolution (1st Edition)

Semester Mark (60%)

Exam Mark: 40%

This exam will cover material from the entire module.

TOTAL FINAL MARK: 100% = Semester Mark + Exam Mark

2 Course Guide – Molecular Genetics – ZOO3649 – 2015 – University of Venda

3 Course Guide – Molecular Genetics – ZOO3649 – 2015 – University of Venda

In 1972, Theodosius Dobzhansky

“Nothing in biology makes sense, except in the light of evolution”.

Study Section Title Lectures Practicals

4 Course Guide – Molecular Genetics – ZOO3649 – 2015 – University of Venda

After this module, students should be able to:

Tests will be given during the practical sessions

5 Course Guide – Molecular Genetics – ZOO3649 – 2015 – University of Venda

Week 2 NO LECTURE NO LECTURE NO LECTURE INTRODUCTION

Week 3 Start Section 1 Practical: Practical

Week 4 Practical Start Section 2

Week 7 Assignment Due

Week 15 NO LECTURE NO LECTURE NO LECTURE NO LECTURE

6 Course Guide – Molecular Genetics – ZOO3649 – 2015 – University of Venda

8 Course Guide – Molecular Genetics – ZOO3649 – 2015 – University of Venda

For additional reading see

By the end of this section you should be able to:

1. Explain what the central dogma is.

9 Course Guide – Molecular Genetics – ZOO3649 – 2015 – University of Venda

Each nucleotide consists of three components:

 a five-carbon sugar molecule (deoxyribose in the case of DNA)

DNA's ability to store - and transmit -

10 Course Guide – Molecular Genetics – ZOO3649 – 2015 – University of Venda

DNA helix showing nitrogenous bases

The story of the discovery of DNA begins in the 1800s…

The molecule of life

Johann was convinced of the importance of nuclein and came very

For many years, scientists continued to believe that proteins were

12 Course Guide – Molecular Genetics – ZOO3649 – 2015 – University of Venda

Albrecht Kossel was a German biochemist who made great progress in

In 1881 Albrecht identified nuclein as a nucleic acid and provided its

The Central Dogma

The central dogma of molecular biology is an

13 Course Guide – Molecular Genetics – ZOO3649 – 2015 – University of Venda

Genes alleles and loci

Genes consist of three types of nucleotide sequence:

 coding regions, called exons, which specify a sequence of amino acids

 non-coding regions, called introns, which do not specify amino acids

The structural components of a gene

14 Course Guide – Molecular Genetics – ZOO3649 – 2015 – University of Venda

The haploid human genome contains

15 Course Guide – Molecular Genetics – ZOO3649 – 2015 – University of Venda

Human beings have 46 chromosomes,

Karyotype of a human male

16 Course Guide – Molecular Genetics – ZOO3649 – 2015 – University of Venda

The chromosome theory of inheritance

Walter Sutton (left) and Theodor Boveri (right)