Sie sind auf Seite 1von 28

Death of the Central Dogma and beyond

The biotech industry was launched on Francis Cricks infamous Central Dogma of molecular
biology, the scientific myth that organisms are hardwired in their genes, and hence, by
moving genes across species separated by billions of years of evolution, new genetically
modified organisms could be created to serve our every need.
http://www.i-sis.org.uk/isisnews/sis24.php#from

The Central Dogma has been thoroughly exploded by scientific findings accumulating since
the mid-1970s, and especially so after the human and other genomes have been sequenced
(see Living with the Fluid Genome, by Mae-Wan Ho).
We bring you the latest surprises that tell you why our health and environmental policies
based on genetic engineering and genomics are misguided; and more importantly, why the
new genetics demands a thoroughly ecological approach to life.

"GM crops are a dead end, invest in non-GM sustainable agriculture right
now"
The Independent Science Panel (ISP) (see SiS 18) took its campaign for a GM-free
sustainable world to the European Parliament on 20 October 2004. One hundred and twenty
registered for the special briefing including 27 who crossed the channel with the scientists
from the UK. The event made a big impression and the participants could not stop
congratulating us afterwards. We thank all our sponsors and supporters for making it such a
success. Cordis News, the official EU news service for science and technology reported the
event the very next day with the title "Politicians, professors and protestors target sustainable
non-GM agriculture". Further media coverage was still coming in five days later.
The ISP message is crucial as GM battles are raging across the world. The high point of the
briefing was the talk by Sue Edwards, Director of the Institute for Sustainable Development,
who helped convinced the Ethiopian government to adopt an organic composting, water and
soil conservation package as its main strategy for combating land degradation and poverty
throughout the country (see SiS 23). It brought home the proven successes of low-input,
health enhancing agricultural practices that should be adopted all over the world.
Sustainable agriculture is particularly important under climate change when oil and water on
which industrial agriculture, and even more so, GM agriculture are heavily dependent - are
both running out. Industrial agriculture uses up to seven times the energy per tonne of food
than organic agriculture; it also turns organic soil, which is a carbon sink, into a carbon
source, and generates other green house gases that exacerbate global warming. In order to
feed the world, we must invest in sustainable, non-GM agriculture across the globe right now,
which will also ameliorate the worse consequences of climate change.
At the same time, important changes have to be made in international agencies and
institutions, which have hitherto supported the dominant model of industrial agriculture as

well as policies that work against poor countries, where farmers are also desperately in need
of secure land tenure.

Biological effects of EMFs still in search of a mechanism


More and more biological effects of electromagnetic fields are documented at weaker and
weaker field intensities, suggesting that the current exposure standards which are aimed at
preventing outright heating of tissues may be up to 10 million fold too high, if we are to
really protect the public. Researchers are finding long-lasting brain damage in rats exposed to
mobile phones, as well as a range of health problems among people living near the mobile
phone masts.
Still, the regulators profess themselves powerless to lower the exposure limits because of the
lack of plausible mechanisms within conventional mainstream science - that could explain
how fields with such minute energies could have any biological effects. Leukaemia, DNA
damage in brain cells and other electromagnetic field effects cannot be explained unless
scientists communicate and collaborate across the disciplines, which they are currently unable
to do, partly due to the lack of interdisciplinary education, partly due to existing funding
structure in research and the general culture of mainstream science that overwhelmingly
discriminates against innovative people and ideas (see also SiS 17). Will our government take
the radical steps needed in scientific research funding and in science education to improve
both the quality of our science and its ability to protect the public?

Water, the medium of life


Entire biochemistry and cell biology textbooks will have to be rewritten to put water at the
centre of living activities. It is indeed water inside cells and in the extracellular matrix thats
stage-managing the continuing drama of life. Enjoy and marvel!
Article first published 22/11/0

Life after the Central Dogma


The biotech industry was launched on the scientific myth that organisms are hardwired in
their genes, a myth thoroughly exploded by scientific findings accumulating since the mid
1970s and especially so since genome sequences have been accumulating (see Living with the
Fluid Genome, by Mae-Wan Ho ).
We bring you the latest surprises that tell you why our health and environmental policies
based on genetic engineering and genomics are completely misguided; and more importantly,
why the new genetics demands a thoroughly ecological approach.

Death of the Central Dogma


Caring Mothers Reduce Response to Stress
for Life
Subverting the Genetic Text

To Mutate or Not to Mutate


Are Ultra-conserved Elements Indispensable?
How to Keep in Concert

Death of the Central Dogma


It is amazing how much scientific and religious fundamentalism have in common. The late
Francis Crick won the Nobel Prize jointly with James Watson and Maurice Wilkins for
working out the structure of DNA; and rather like the new 'Potentate' of biology, issued the
"Central Dogma" to the faithful, which decreed that genetic information flows linearly from
DNA to RNA to protein, and never in reverse. That was just another way of saying that
organisms are hardwired in their genetic makeup, and that the environment has little if any
influence on the structure and function of the genes.
The Central Dogma goes hand in glove with the other dogma of biology, the neo-Darwinian
theory of evolution by natural selection, which says that the genetic material mutate at
random, and individuals which happen to have good genes leave more offspring, just as
individuals with bad genes are weeded out. The neo-Darwinian theory is beloved of the status
quo, because it endows the rich and powerful with a certain mystique, as those who have won
the race in the struggle for survival of the fittest, of being in possession of good genes (= good
breeding); while the poor and dispossessed have only their bad genes to blame.
Since the mid-1970s, if not before, molecular geneticists studying the genetic material have
been turning up evidence that increasingly contradicts the Central Dogma. There is an
immense amount of necessary cross talk between genes and the environment in the life of the
organism, which not only changes the function of the genes but also the structure of the genes
and genomes. By the early 1980s, the new genetics of the "fluid genome" has emerged.
But apart from a few heretics like Barry Commoner and myself, no one dared to say a word
against the Central Dogma or the neo-Darwinian theory of evolution.
Things may have changed within the past two years, thanks to the good sense and good
management of the public gene sequencing consortium to insist on depositing gene sequences
in a single public database, freely available to all researchers.
This database is not much use for business and drug discovery; that much is clear, as one after
another 'bioinformatics' company that tried to horde the data has gone out of business. But,
collected in one freely accessible central database, it is very good for research that exposes the
poverty of the genetic determinism ideology that has led to the creation of the database in the
first place.

The evidence against the Central Dogma has piled up to such an extent that rumblings of
"challenging the dogma" and "a new theory is needed to replace the central dogma" can even
be heard in the mainstream scientific journals. Though Dr. Ewan Birney, who gave the Royal
Society's inaugural Francis Crick Lecture in December 2003, still paid elaborate homage to
the Central Dogma, with arrows pointing strictly one-way from DNA to RNA to protein,
leaving out all the many more arrows that point in reverse.
What are the latest surprises that the fluid and flexible genome has in store? One area is the
importance and pervasiveness of epigenetics, specifically, chemical markings on the DNA
and proteins binding to the DNA in the chromosomes that determine patterns of gene
expression, or which bits of the genetic text is actually read. That is overwhelmingly
determined by experience. In an earlier issue (SiS 20), we showed the mother's diet and stress
can affect patterns of gene expression in the embryo and foetus, which determines the
individuals' health prospects much later in life.
Now, researchers are finding genes that are marked for life in rat pups, strictly by how their
mothers care for them during their first week of life after birth (see "Caring mothers reduce
response to stress for life", this series). It leaves one in no doubt that the environment is
giving the instruction of which genes to turn on.
Only a few years ago, people were referring to the 98% or more of the genome that doesn't
code for proteins as "junk DNA". Not any more. The genome has a definite 'architecture' that
holds up beneath the fluidity. There is a high degree of non-randomness in the parts of the
genome that undergo change. While some parts are hypermutable, certain families of
sequences are 'homogenized' to be nearly identical (see "Keeping in concert", this series),
while still others are 'ultraconservative' in that they have remained absolutely unchanged in
hundreds of millions of years of evolution ("Are ultraconserved elements indispensable?" this
series). And when cells get into a tight corner metabolically speaking, there may even be
genes that mutate to get them out of it ("To mutate or not to mutate", this series).
Most of all, there is a big treasure trove within the apparent junkyard of the genome. Many
sequences that don't code for proteins are involved in regulating development and gene
expression. Many of the surprises are associated with findings that indicate most of the action
is not in proteins, but in the numerous species of RNA 'interfering' at all levels of the 'readout'
of genetic information: with the DNA, with other RNA species, and with proteins (see "RNA
subverting the genetic text", this series).
All of this goes against the very grain of the Central Dogma that posits linear, mechanistic
control. Instead, layers upon layers of chaotic complexity are coordinated, it seems, by mutual
agreement, in an incredibly elaborate, exquisite dance of life that dances itself freely and
spontaneously into being.
It is not so much that we need a new theory to replace the central dogma; it is more important
than that. We need a new way of knowing and being organisms that will prevent us from
mistaking organisms for instruments and machines. That's the real challenge.
Article first published 03/09/04

Caring Mothers Reduce Response to Stress


for Life
How a rat responds to stress depends on whether its mother cared for it properly as a pup,
which marks its genes for life. Dr. Mae-Wan Ho reports
Maternal effects in the spotlight

Maternal effects on the development of offspring are well known. But they are thought to be
due to nutritional and physiological factors affecting the foetus in the womb; and within the
past few years, geneticists have discovered that diet and stress can profoundly change the
pattern of gene expression in the offspring, affecting their health prospects as adults (see Diet
trumping genes, SiS 20).
A team of researchers from the Douglas Hospital Research Centre and McGill University in
Montreal Canada, and the Molecular Medicine Centre, in Edinburgh University Western
General Hospital in the UK, now report a remarkable experiment in which the behaviour of
the mother nursing her pups not only affects the pups response to stress as adults, but are
correlated with changes in gene expression states in brain cells that persist into adult life.
Such changes are referred to as epigenetic as they do not involve alterations in the base
sequence of DNA in the genome, only their off and on states; but they can persist in the brain
cells and are passed on to all the daughter cells.
Caring mothers reduces stress response of pups

In the nest, the mother rat licks and grooms her pups, and while nursing, arches her back to
groom and lick her pups. Some mothers (high performers) tend to do these more frequently
than others (low performers). As adults, the offspring of high performers are less fearful and
show more modest responses to stress in the hypothalamus-pituitary-adrenal (HPA) neuroendocrine pathway.
Cross-fostering studies showed that the biological offspring of low-performers reared by highperformers, resemble the offspring of high performers, and vice versa.
Maternal behaviour, therefore, alters the development of the HPA responses to stress. The
magnitude of the HPA response is a function of the corticotropin-releasing factor (CRF)
secreted by the hypothalamus, which activates the pituitary-adrenal system. This is modulated
by glucocorticoid, which feeds back to inhibit CRF synthesis and secretion, thus dampening
the HPA responses to stress. The adult offspring of high-versus low performer mothers show
increased glucocorticoid expression the hippocampus, and enhanced sensitivity to

glucocorticoid feedback. If this difference is eliminated, so is the difference in HPA responses


to stress.
Maternal care and gene expression

Previous studies indicate that the maternal behaviour of licking and grooming and arching her
back to do so while nursing increased the expression of glucocorticoid receptor (GR),
accompanied by, among other things, an increased expression of a special transcription factor,
NGF1-A, which binds to the promoter of the GR gene to increase its transcription and
expression. But how could this be transmitted from the neonate to the adult?
The answer is: through the structure of chromatin (complex of protein and DNA in the
chromosomes), and the methylation of DNA. DNA methylation is a stable chemical
modification of the cytosine in the cytosine-guanine (CpG) dinucleotides, often associated
with stable variations in gene transcription. Under-methylation of CpG dinucleotides is
associated with active transcription. The researchers decided to look at the methylation state
of the GR promoter around the binding region of the NGF1-A transcription factor in the
hippocampus of adult offspring from high and low performers.
Sure enough, they found highly significant differences in methylation, with low methylation
in offspring from high-performing mothers and high methylation in offspring from lowperforming mothers, corresponding to high and low expression respectively of the GR.
Cross-fostering results in methylation patterns associated with the adoptive mother, as
consistent with the change in the adult offsprings responses to stress. Moreover, these
epigenetic differences due to maternal behaviour during the first week of life persisted into
adulthood.
A clean slate at birth

Amazingly, the pups of both high and low-performing mothers start out life genetically the
same. Just before birth, the entire region of the GR promoter was unmethylated in both
groups; and day one after birth, methylation is found in the region in both groups to the same
extent.
The changes in methylation pattern then develops within the first week according to the
behaviour of the mother, and thereafter remain for the rest of their lives. This finding is
consistent with earlier studies showing that the first week of postnatal life is a critical period
for the effects of early experiences on hippocampus GR expression.
The hippocampus is the emotion centre of the brain, and is believed to be responsible for
transferring memory to the rest of the brain. It is vulnerable to stress and richly supplied with
receptors for the sex hormones [2, 3].
Additional markings of the gene

Next, the researchers looked at the structure of chromatin around the GR gene, as chromatin
structure determines whether a gene is transcribed or not. Chemical modification of the
histones (major chromatin protein) by adding an acetyl- group is a well-established marker for
active chromatin around transcribed genes, which makes it accessible for the transcription

enzyme complex. Again, they found highly significant changes in acetylation between the two
groups of pups. There was greater acetylation and threefold greater binding of the NGF1-A
transcription factor to the GR promoter in the adult offspring of high- compared with lowperforming mothers.
Marked for life?

Now, a critical question is, are these gene-marking changes reversible? Is the adult doomed to
conditioning by the mothers behaviour towards it as a pup? The general belief is that one is
marked for life. DNA methylation pattern is irreversible. However, recent data from in vitro
experiments suggests that under certain circumstances, it is possible to demethylate DNA by
increasing histone acetylation through a chemical inhibitor of the deacetylating enzyme,
trichostatin A (TSA). The researchers, rather crudely, infused the adult brain with TSA by
applying the solution into the ventricle (space inside the brain), and obtained more than 3-fold
binding of the NGF1-A protein to the GR promoter in the adult offspring of low-performers,
and as expected, no change in the adult offspring of high-performers. Simultaneously
correlated changes in DNA methylation pattern of the GR promoter was found in the adults
reared by low-performing mothers treated with TSA, but not those reared by high-performing
mothers. In other words, those epigenetic changes were reversed.
The next question is, are the reversal of epigenetic changes associated with reversal in HPA
responses to stress? The answer, incredibly, is yes. The TSA treatment, crude as it was,
appeared to significantly decreased plasma corticosterone in the offspring of low-performer in
response to stress.
This is all grist to the mill of the fluid and adaptive, adaptable genome [4] that makes
nonsense of the Central Dogma.
Article first published 07/09/04

References
1. Weaver ICG, Cerboni N, Champagne FA, DAlesslo AC, Sharma S, Seckl JR, Dymov S, Szyf M
and Meaney MJ. Epigenetic programming by maternal behavior. Nature Neuroscience 2004,
7, 847-54.
2. PsychEducation.org http://www.psycheducation.org/index.html
3. hyperdictionary http://www.hyperdictionary.com/medical/hippocampus
4. Ho MW. Living with the Fluid Genome, ISIS & TWN, London & Penang, 2003

Subverting the Genetic Text


Dr. Mae-Wan Ho exposes the hidden intrigues in the vast RNA underworld where layers of
interference and machinations subvert the chain of command from DNA to RNA to protein.
Updating and re-interpreting the sacred text

According to the Central Dogma, DNA, the genetic text, is read out into RNA and RNA is
translated into protein. RNA is rather like the scribe copying and translating the sacred text to
direct the faithful.
But geneticists are now uncovering a vast underworld of heresy to the Central Dogma where
RNA agents not only decide which bits of text to copy, which copies get destroyed, which bits
to delete and splice together, which copies to be transformed into a totally different message
and finally, which resulting message - that may bear little resemblance to the original text gets translated into protein. RNAs even get to decide which parts of the sacred text to rewrite
or corrupt.
The whole RNA underworld also resembles an enormous espionage network in which genetic
information is stolen, or gets re-routed as it is transmitted, or transformed, corrupted,
destroyed, and in some cases, returned to the source file in a totally different form.
And this underworld is big, really big. The protein-coding sequence is only about 1.5% of the
human genome. Yet, around 97 - 98% of the transcriptional readout of the human genome is
non-protein-coding RNA. This estimate is based on the fact that intronic RNA makes up 95%
of the primary protein-coding transcripts on average, and there are large numbers of noncoding RNA transcripts which may represent at least half of all transcripts. Most of the
miRNAs (microRNA, see below), for example, are derived from (intergenic) regions between
genes; and almost half of all transcripts from the mouse genome are non-coding RNAs. A
similar estimate applies to the human genome [1].
The inescapable conclusion is that the job of mediating between DNA and protein is really the
centre stage of molecular life. And who gives orders to the multitudes of RNA agents? In a
sense it is everyone and no one, because the system works by perfect intercommunication. It
is not the DNA, but rather, the particular environment in which the RNA agents find
themselves.
For the organism (organization) to survive, it needs to turnover the DNA text continuously,
adapting to the realities of its environment. In the process, it keeps certain texts invariant (see
"Are ultra-conserved elements indispensable?" this series), while changing others rapidly in
non-random ways (see "To mutate or not to mutate", this series). It also needs to keep
referring to texts that are relevant, modifying it, or updating the interpretation in keeping with
the times (see "Keeping in concert" this series).
RNA interference

RNA interference (RNAi) was first discovered in the nematode worm, C. elegans in the
1990s. Researchers noticed that injecting either sense RNA (the sequence that gets read and
translated into protein) or antisense RNA (the complementary sequence, which does not code

for protein) into the worm led to specific silencing of the gene involved. It was later found
that the phenomenon was actually caused by double-stranded RNA (dsRNA) contaminating
the sense or antisense RNA. RNAi now refers to all gene-silencing induced by dsRNA.
These include a host of other phenomena discovered at around the same time [2, 3]. For
example, a gene could be silenced, or co-suppressed, simply by introducing an extra copy
into the genome as a transgene, and transgenes themselves may be silenced either at or after
transcription. The coat protein gene of a virus transferred into a plant may protect the plant
from the virus, by silencing the virus genes.
All these phenomena are interlinked through special pathways of RNA processing that are
only just being defined (see Fig. 1). Abnormal single stranded RNA (ssRNA) is turned into a
double stranded RNA (dsRNA) by an RNA-dependent RNA polymerase enzyme (RDRP).
The dsRNA is then chopped up into small pieces or microRNA (miRNA) by the enzyme
Dicer. The same enzyme also processes certain hairpin RNA (hpRNA) and related premicroRNA (pre-miRNA) into miRNA. The miRNA is further processed into single-stranded
RNA that's incorporated into a multiprotein complex called RNA-induced silencing complex
(RISC). At this point, the single stranded RNA fragment binds to complementary part of the
messenger RNA and either causes the breakdown of the mRNA or prevents its translation into
protein.
Remember that all this depends on complementary base pairing, just as in DNA, so these
mechanisms could potentially exist for each and every one of the now estimated 24 500 genes
in the genome.

Figure 1. RNA interference pathways


It turns out that dsRNA is not only involved in signalling the breakdown or inactivation of
specific mRNA to prevent the expression of the protein coded, it is also involved in triggering
anti-viral response in mammals. And this is a major obstacle to achieving RNAi in mammals,
which might be useful in silencing specific genes in gene therapy.
Double-stranded RNAs longer than 30 nt (nucleotide) activate an antiviral response that
includes the production of interferon, resulting in the non-specific breakdown of RNA
transcripts and a general shutdown of protein synthesis. In order to overcome this obstacle,
synthetic 21nt miRNAs have been used. These are long enough to induce gene-specific
suppression and short enough to evade host interferon response. However, recent work has
shown that under certain conditions, even such small miRNAs can activate the interferon
system. One activating signal for the interferon response appears to be the triphosphate group
10

at the 5 end of the miRNA synthesized by a phage polymerase [4]. In addition, there are
other problems, such as avoiding interfering with non-target sequences [5], especially as
perfect base-pairing is not required, and matches of as few as 11 consecutive nucleotides can
give non-target effects.
RNA-directed DNA read-out
The dsRNA involved in RNA interference can selectively silence genes at the read-out or
transcription stage [6]; dsRNA species homologous to promoters are involved in crippling the
promoter by methylation (adding methyl (-CH3) groups) in the region of sequence overlap, so
no transcription can occur. In other cases, a dsRNA resulting from a bi-directional
transcription of a repeat element leads to methylation of a nearby histone protein H3 in
chromatin, which, too, results in gene silencing.
Transcriptional gene silencing can potentially be initiated by the dsRNA formed from pairs of
transcriptional units arranged in a tail-to tail orientation (sense antisense transcription units,
SATs). In humans, SATs account for most overlapping transcriptional units (70%). A recent
survey estimated that there are 1 600 human SATs (or 3 200 transcription units). When both
transcriptional units are active, formation of dsRNA occurs by default, leading to
modification of the histone protein and gene silencing. This mechanism is involved in
imprinting: the marking of genes in chromosomes to determine whether they are expressed in
cell clones. Expression of the gene only occurs when the antisense promoter is methylated and
inactive.
Recently, a new kind of trans-acting (acting across to different parts of the genome) RNA was
identified in mouse [7]. B2 RNA originates from a short interspersed repetitive element
(SINE) repeated more than 105 copies in the genome of multicellular plants and animals. They
were previously thought to be molecular parasites with no function. However, the level of B2
and related RNAs have been found to increase up to 100-fold in response to environmental
stresses such as heat shock. And B2 RNA is required for the concomitant inhibition of RNA
polymerase II during heat shock, by interacting directly with the enzyme, preventing it from
working. RNA polymerase II is involved in the transcription of all protein-coding RNA. So an
inhibition of RNA polymerase II will decrease the synthesis of many proteins.
A special kind of RNA directed DNA read-out is accomplished via RNA riboswitches to
switch genes off in response to the concentration of a metabolite in the cell, without the need
for a protein repressor (see Box).
Riboswitch and other RNA regulators

A new molecular switch involves an RNA molecule with enzyme activity, a ribozyme, which
can self-destruct by self-cleavage [8]. This self-cleavage is accelerated 1 000 fold in the
presence of a small sugar molecule, glucosamine-6-phosphate, which is generated by the
enzyme protein encoded by a portion of the mRNA downstream from the ribozyme sequence.
So, this simple gene regulatory circuit involves the mRNA being translated into the enzyme,
which makes the product, glucosamine-6-phospate. As the product accumulates, it binds to
the special catalytic element in the mRNA, causing it to self-destruct. The region of the
mRNA that can confer this regulatory activity is roughly 75 nucleotides long. When placed

11

upstream of an un-related reporter gene, it also shuts down its expression, showing that this
active RNA element is transplantable.
A particular group of ribozymes forms a pocket that binds guanosine monophosate, one of the
four building blocks of RNA. A specific region of the RNA from the Human
Immunodeficiency Virus (HIV) binds a derivative of the amino acid arginine. Short (<100
nucleotide) RNA aptamers (DNA or RNA molecules that bind other molecules) have been
identified that specifically bind everything, from hydrophobic (water-hating) amino acids to
small organic molecules and metal ions. An RNA aptamer can even distinguish the plant
alkaloid theophylline from the closely related molecule caffeine.
Aptamers found within some natural mRNAs bind small molecules as part of their generegulatory feedback circuits. In the E. coli bacterium, coenzyme B12 binds directly to, and
thereby represses translation of, the mRNA coding for the protein that transports its precursor,
cobalamin. In Bacillus species, the synthesis of thiamine and riboflavin involves discrete
genetic units or operons, controlled by direct binding of thiamine pyrophospate and flavin
mononucleotide to leader sequences of the corresponding mRNAs, resulting in the premature
termination of transcription.
Several research groups had previously engineered artificial riboswitches that accomplish
exactly the same task, that is, induce ribozyme-mediated cleavage of the RNA on binding
small molecules, before these were discovered in nature.
RNA splicing

It is estimated that 64% of the genes in the human genome is interrupted [9]; i.e., the coding
regions exist in short stretches (exons) interrupted by long non-coding stretches (introns).
After the entire sequence is transcribed into RNA, the non-coding stretches are spliced out,
leaving the coding sequence. However, different exons can be spliced together, and the
borders between the exons and introns can themselves be shifted. Alternative splicing
multiplies the number of different proteins that can be obtained from a single gene. This is a
case of extensive cutting and pasting of the genetic text to suit the occasion.
The fruitfly gene Dscam (homologue of the Down syndrome cell adhesion molecule) codes
for a cell-surface protein essential for the development of the fruitfly's brain. It has so many
exons that a total of 38 016 possible alternative splice forms could be generated. Geneticists
from the Whitehead Institute for Biomedical Research, Cambridge, Massachusetts in the
United States analysed the splice forms expressed by different cell types and by individual
cells, and found that the choice of splice variants is regulated both spatially and temporally
[10].
Different subtypes of photoreceptor cells express broad yet distinctive spectra of Dscam
splice forms. Individual photoreceptor cells express about 14-50 splice forms chosen from the
spectrum of thousands distinctive of its cell type. Thus, the repertoire of each cell is different
from those of its neighbours.
The complexity does not end there. Not only are different splice variants obtained from the
same primary transcript, trans-splicing between different primary transcripts can also take
place [11], multiplying the combinatorial possibilities of proteins available.

12

There's increasing evidence that genomic variants in both coding and non-coding sequences in
genes can have unexpected deleterious effects on the splicing of gene transcripts [12]. Even
synonymous base substitutions (those that do not change the amino acid sequence of the
encoded protein) and sequence changes within the introns can affect splicing and cause
diseases.
RNA-directed rewriting of RNA

Some nucleotides are deleted during splicing and others changed by editing. Around 41 to
60% of mouse multi-exon genes generate alternatively spliced transcripts, the frequency of
edited transcripts is unknown. These processes generate new sequences not found in the gene.
Trypanosomes show the importance of RNA rewriting. Their survival depends on editing
defective mitochondrial transcripts using trans-encoded RNA sequences to guide insertion
and deletion of uridine bases. The rewriting of RNA restores the correct reading frame,
allowing the production of functional gene products. RNA guides are also used to direct
rewriting of RNA during editing and splicing of pre-mRNA. In some cases, editing creates
splice sites and in others splicing prevents editing.
Rewriting of RNA is associated with a high turnover of transcripts. Of all the RNA
transcribed in the human nucleus, only about 5% enters the cytoplasm Quality control
mechanisms dispose of incompletely or improperly processes messages encoding flawed
proteins.
RNA-directed rewriting of DNA

Genomes can be rewritten using reverse transcription to record elements of successful


ribotypes (combination of RNAs). Around 45% of the human genome is derived from
retrotransposition. RNA-directed rewriting of DNA also has an essential role in maintaining
genome stability. Telomerase is a reverse transcriptase that uses an RNA guide to rewrite the
ends of chromosomes (telomeres) and prevent their loss, which is important for maintaining
the stability of the genome..
Coordination of information

In each ribotype, only specific transcripts are produced and particular mRNAs translated.
These outcomes are achieved by coRNAs that coordinate the action of highly conserved
pathways. An RNA product from one processing event may regulate a downstream event,
making the second outcome contingent on the first. For example, a miRNA encoded in an
intron would only be expressed when the host gene is transcribed. CoRNA may facilitate
coordination of pathways by interacting with sequence motifs shared by a number of targets.
Evolution of rule sets requires creation of new coRNAs, possibly by duplication and mutation.
New coRNAS would result in assembly of new regulatory complexes on conserved DNA
elements, new patterns of gene expression during development.
Replication of ribotypes

Both genetic modification, involving changes in DNA, and epigenetic modifications, such as
DNA methylation and histone acetylation, can be inherited. For example, imprinting is
determined by the parent of origin of a chromosome, which means that at some point maternal

13

and paternal chromosomes are marked so that they can be distinguished during embryonic
development. Methylation may undergo variable erasure during primordial germ cell
development, producing epigenetic mosaic individuals. The persistence of such epigenetic
marks is relevant to the origin of complex diseases. Here, the susceptibility of offspring to
disease can depend on whether there is maternal or paternal history of disease as well as
ethnicity.
Transmission of ribotypes also occurs more directly. The embryo receives RNA from the
mother that is important in specifying cells fate. The foetus is also exposed to the maternal
environment, which can influence the foetal phenotype. For example, pregnant female mice
fed a diet rich in methyl donors have litters with fewer yellow-coloured agouti Avy offspring,
reflecting enhanced silencing of the retroviral promoter in this allele (see "Diet trumping
genes", SiS 20). In other cases, integration of signals received from maternal hormones may
trigger epigenetic modifications that alter long-term phenotypic development by modulating
RNA co-regulatory networks. Low birth weight, for example, has been shown to correlate
with lifetime risk of cardiovascular disease and diabetes mellitus.
Recently, it has been demonstrated that the plasma of pregnant women contains circulating
mRNA originating from the foetus [13], which is rapidly cleared after delivery. This raises the
question of whether coRNAs secreted by various somatic tissues are also used to transmit
information from mother to foetus, a serious case of the inheritance of acquired characteristics
not coded in the genome.
Article first published 09/09/04

References
1. Semon M and Duret L. Evidence that functional transcription units cover at least half of the
human genome. TRENDS in Genetics (in press, 2004).
2. Kusaba M. RNA interference in crop plants. Current Opinion in Biotechnology 2004, 15,
13943.
3. Novina CD and Sharp PA. The RNAi revolution. Nature 2004, 430, 161-4.
4. Samuel CE. Knockdown by RNAi - proceed with caution. Nature Biotechnology (News and
Views) 2004, 22, 280-2.
5. Caplen NJ. Gene therapy progress and prospects. Downregulating gene expression: the
impact of RNA interference. Gene Therapy 2004, 11, 1241-8.
6. Herbert A. The four Rs of RNA-directed evolution. Nature genetics 2004, 36, 19-25.
7. Wassarman KM. Nature Structural & Molecular Biology 2004, 11, 803-4
8. Cech TR. RNA finds a simpler way. Nature (news and views) 2004, 428, 263-4.
9. EASED: Exended Alternatively Spliced EST Database. http://eased.bioinf.mdcberlin.de/statistics.html
10. Neves G, Zucker J, Daly M and Chess A. Stochastic yet biased expression of multiple Dscam
splice variants by individual cells. Nature Genetics 2004,
http://www.nature.com/naturegenetics
11. Dorn R, Reuter G and Loewendorf A. Transgene analysis proves mRNA trans-splicing at the
complex mod(mdg4) locus in Drosophila. Proc Natl Acad Sci USA 2001, 98, 9724-9.

14

12. Pagani F and Baralle FE. Genomic variants in exons and introns: identifying the splicing
spoilers. Nature Reviews Genetics 2004, 5, 389-96.
13. Ng EKO, Tsui NBY, Lau TK, Leung TN, Chiu RWK Panesar NS, Lit LCW, Chan K-W and Lo YMD.
mRNA of placental (and hence foetal) origin is readily detectable in maternal plasma. PNAS
2003, 100, 4748-53.

To Mutate or Not to Mutate


Contrary to views widely held not so long ago, genes do not as a rule mutate at random, and
cells may choose what, or at least, when to mutate. Dr. Mae-Wan Ho reports

Non-random 'adaptive' mutations?


The backbone of modern genetics and the neo-Darwinian theory of evolution by natural
selection is that gene mutations occur at random, independently of the environment in which
the organisms find themselves. Those mutations that happen to be 'adaptive' to the
environment are 'selected', while those that are deleterious are weeded out.
The idea that genes do not mutate at random, but 'adaptively', as though 'directed' by the
environment in which the organisms find themselves, is so heretical that most biologists
simply dismiss it out of hand; or try their utmost to explain away the observations that give
life to the idea.
Microbiologist Max Delbrck first used the term 'adaptive mutations' in1946 [1] to refer to
mutations formed in response to an environment in which the mutations are selected. The
term was adopted more than 40 years later by a research team investigating gene
amplification in rat cells [2]. They distinguished between mutations that pre-exist at the time a
cell is exposed to a selective environment from those 'adaptive' mutations formed after
exposure to the environment.
Other workers have followed the same definition [3, 4]. These 'adaptive' mutations arise in
non-growing or slowly growing cells after the cells were exposed to conditions that favour the
mutants, preferentially, though not exclusively, in those genes that could allow growth if
mutated. Unselected mutations also accumulated in most studies, to varying degrees, so the

15

mutations are not strictly 'directed'. Instead, the cells appear to activate a number of different
mechanisms that target mutations to genes, the end result of which is to enable them to grow,
which they otherwise would not be able to do.

The archetypal experiment


John Cairns and Patricia Foster created an E. coli strain defective in the lac gene that leaves
the cells unable to grow on lactose [3]. They plated out the bacteria on a minimal medium
with lactose, and looked for mutants that revert back to normal. As the cells used up the small
amount of nutrient they stopped growing. But after some time, mutants began to appear that
could grow on lactose. However, the mutations are not strictly directed to the gene in which
mutations could be advantageous, as unselected mutations also accumulated. In fact, the
mechanisms look like "inducible genetic chaos" according to a reviewer [4].
The defective lac gene in the E. coli strain was in fact a frameshift mutant, in which a small
deletion or addition of a nucleotide shifted the whole reading frame of the gene, so it became
translated into a totally different enzyme that has little or no ability to break down lactose.
This defective lac gene was carried in an F' plasmid involved in bacterial conjugation. Two
types of adaptive genetic change are now known to occur in the lac frameshift system: point
mutations involving changes in base sequence of the DNA, and gene amplification involving
the generation of multiple copies of the defective gene so that large amounts of defective
enzyme can still function to metabolise enough lactose to allow the cells to grow.
The point mutation mechanisms are highly diverse, and includes DNA breakage,
recombination break repair, genome-wide hypermutation in a subpopulation of cells that give
rise to some or all of the adaptive mutants, a special inducible mutation-generating DNA
polymerase (polIV or DinB) that has homologues in all three domains of life. There are now
many bacterial and yeast assay systems in which adaptive and stationary-phase mutations
have been reported, but the mechanisms are largely unknown.
Some of the mechanisms that underlie adaptive genetic change bear similarities to genetic
instability in yeast and in some cancers and to somatic hypermutation in the immune system.
They might also be important in bacterial evolution to antibiotic resistance, and the evolution
of phase-variable pathogens, which evade the host immune system by frequent variation of
their surface components.
In the experiment, Lac+ mutants that existed before exposure to the lactose plates form visible
colonies by about two days. The colonies that emerged after 2 days fall into two classes. Most
of the Lac+ colonies (~160 /108 cells at 10 days) are adaptive point mutants, which occur by a
recombination dependent mechanism and produce compensatory frameshift mutations. On
later days (from ~4), an increasing fraction (up to ~35 out of a total of ~160 on day 10) of the
colonies are not point mutants but amplifications (20-50 direct repeats) of a 7-40kb region of
DNA that contains the lac frameshift gene, which provides sufficient gene activity to allow
growth on lactose medium. The number of E. coli cells does not increase during the first five
days.

A profusion of mechanisms
There are many ways to generate adaptive mutations.

16

Interestingly, adaptive point mutations in the lac system requires homologous recombination
proteins of the E. coli RecBCD double-strand break-repair system which is widely involved in
gene conversion and recombination (see "How to keep in concert", this series). Double-strand
ends could be generated during DNA replication by a number of different mechanisms.
The adaptive Lac+ point mutations that revert a framewhift allele are nearly all -1 deletions
(deletion of a single nucleotide) in small mononucleotide repeats, whereas the pre-existing
(non-adaptive) Lac+ reversions are heterogeneous. Mononcleotide repeat instability is thought
to reflect DNA polymerase errors, which is consistent with the requirement of a special errorprone DNA polymerase (polIV) for adaptive mutations.
The 'SOS response' is the bacteria's response to DNA damage or the inhibition of DNA
replication. It involves de-repression of at least 42 genes that carry out DNA repair,
recombination, mutation, translesion DNA synthesis (synthesis across non-repaired or
damaged DNA) and prevent cell division.
Global hypermutation is thought to occur in a subpopulation of the cells. This is because the
frequencies of unselected mutations are about two orders of magnitude higher among Lac+
mutants than in the main population of Lac- starved cells. These results mean that stationaryphase mutations in this system are not directed exclusively to the lac gene, and both adaptive
and neutral mutations are formed. Some or all of the adaptive mutants arise in a subpopulation
that is hypermutable relative to the main population.
The subpopulation of cells that are transiently mutable is estimated to be between 10-3 and 104
of all cells. Despite that, the frequency per unit length of DNA in the genome is markedly
uneven, with definite hotspots and coldspots, perhaps depending on the proximity to double
strand breaks (DSBs) in DNA that are generated.
Gene amplification is 'adaptive' in the sense that it only occurs in response to the selective
environment. Cells carrying the amplification are not hypermutated in unselected genes, and
neither the SOS response nor polIV is required. Dependence on homologous recombination is
implied in that adaptive Lac+ colonies do not appear in the absence of RecA and RecBCD
enzyme, and RuvAB and C recombination proteins.

Similar findings in bacteria isolated from the wild


Until 2003, the phenomenon of adaptive mutations has been observed only in laboratory
strains. But researchers from the University of Paris, France, and the National University of
Mexico (UNAM) reported similar stress-inducible mutagenesis in stationary-phase bacterial
colonies grown from strains culled from the wild [5]. This provides evidence that most natural
isolates of E. coli from diverse habitats worldwide increase their mutation rates in response to
the stress of starvation.
A total of 787 E. coli isolates were collected from habits including air, water and sediments,
and the guts of a variety of host organisms. Colonies formed during the exponential growth
phase were subjected to starvation during a prolonged stationary phase, and the production of
mutants was monitored in the starved aging colonies. The vast majority of colonies showed an
increased number of mutants. In a sample of colonies, the authors were able to link the
increased mutagensis to starvation and oxidative stress by showing that either additional sugar
or anaerobic incubation could block the increased mutagenesis.

17

The bacteria were highly variable in their inducible mutator activity. The frequency of
mutations conferring resistance to rifampicin (RifR) in day 1 (D1) and day 7 (D7) was
measured. For all strains, the median values of RifR mutations were 5.8 x 10-9 on day 1, and
4.03 x 10-8 on day 7, an increase of 7 fold, while the median number of colony-forming units
increased 1.2-fold. In comparison, the E. coli K12 MG1655 lab strain showed a 5.5-fold
increase in frequency of RifR and a 1.7 fold increase in colony forming units. Constitutive
mutator strains having a D1 mutation frequencies >10-fold or >100-fold higher than the
median D1 frequency of all the strains represented 3.3% and 1.4% of isolates respectively.
The D7/D1 mutation frequency ratio showed that 45% of strains had more than a 10-fold, and
13% more than a 100-fold increase in mutagenesis over 7 days. Interestingly, constitutive
mutagenesis and MAC (mutagenesis in aging cells) showed a negative correlation.
The MAC was genome wide in a large fraction of natural isolates. There was no significant
correlation between MAC and phylogeny. The host's nutrition might explain some of the
variation of MAC. For example, bacteria from the guts of omnivorous species like human
beings have weaker stress-inducible mutator activities than those from carnivores.
The mechanisms for generating mutations looked even more diverse than in the laboratory
strains [6].

Wider significance of adaptive mutations


Amplification is an important manifestation of chromosomal instability prevalent in many
human cancers, and DSBs in DNA are also involved. Induction of mammalian amplification
by selective agents is correlated with the ability of those agents to produce chromosomal
breaks.
The adaptive point mutation mechanism at lac might be relevant to microbial evolution,
particularly of pathogenic bacteria. Many phase variable pathogens have simple repeated
sequences that flank genes that they regulate by frameshift mutation.
These 'contingency genes' used under stress provide phase variations that allow evasions of
the immune system. Two of them, Neisseria meningitides and N. gonorrhoeae, have one or
more genes homologous to dinB. For many pathogenic bacteria, antibiotic resistance is also
achieved by point mutation mechanisms and could be induced adaptively. Even antibiotics
that cause lethality can be merely bacteriostatic at lower concentrations, such that stresspromoted mutation mechanisms might be significant in the development of resistance in
clinical environments.
In multicellular eukarytoes, parallels between adaptive mutation and cancer have been noted,
the key being that acquisition of mutations in growth-limited state (stress) allows cells to
proliferate.
Humans have three E.coli polIV homologues of unknown function, in the
DinB/UmuDC/Rad30/Rev1 superfamily of DNA polymerises, as well as a homologue known
to carry out translesion synthesis (the tumour suppressor protein XP-V). DinB1 or pol, a true
DinB orthologue, is found in germline and lymphoid cells. More and more geneticists now
think that mutation is regulated [7], or at any rate, provoked, and highly non-random.

18

Indeed, in one study on 12 long-term E coli lines, 36 genes were chosen at random, and 500
bp regions sequenced in four clones from each line and their ancestors [8]. Several mutations
were found in a few lines that evolved mutator phenotypes, but no mutations were found in
any of the 8 lines that retained functional DNA repair throughout the 20 000 generations
experiment. This confirms the low level of 'spontaneous' or unprovoked mutation.
Article first published 15/09/04

1. Delbrck M. Cold Spring Harbor Symp.


Quant. Biol. 1946, 11, 154 (cited by
Rosenberg SM, 2001).
2. Tlsty T D, Margolin B H & Lum K.
Differences in the rates of gene amplification
in nontumorigenic and tumorigenic cell lines
as measured by Luria-Delbrck fluctuation
analysis. Proc. Natl Acad. Sci. USA 1989, 86,
9441-5.
3. Cairns J. & Foster P L. Adaptive reversion of
a frameshift mutation in Escherichia coli.
Genetics 1991, 128, 695-701.
4. Rosenberg SM. Evolving responsively:
adaptive mutation. Nature Reviews Genetics
2001, 2, 504-15.
5. Bjedov I, Tenaillon O, Gerard B, Souza V,
Denamur E, Radman M, Taddei F and Matic
I. Stress-induced Mutagenesis in Bacteria.
Science 2003, 300, 1404-7.
6. Rosenberg SM and Hastings PJ. Modulating
mutations rates in the wild. Science 2003,
300, 1382-3.
7. Drake JW. Spontaneous mutation. Ann. Rev.
Genet. 1991, 26, 126-46.
8. Elena SF and Lenski RE. Evolution
experiments with microorganisms: the
dynamics and genetic bases of adaptation.
Nature Reviews Genetics 2003, 4, 457-68.

19

Are Ultra-conserved Elements


Indispensable?
Geneticists identified elements in the genome that are 'ultra-conserved', and thought they
must be indispensable for survival. Not so. Dr. Mae-Wan Ho reports

The "molecular clock" of mutational changes


Until now, most geneticists believe that the DNA in the genome is subject to random
mutations, most of which are neutral - neither good nor bad for the organism - so the result is
a slow and steady change in DNA sequences in the genome in the course of evolution. This is
the basis of the "molecular clock" hypothesis, which enables one to estimate, from the
changes in DNA, the time in the past at which certain evolutionary events happened. For
example, when it was that the first human immune deficiency virus (HIV-1) split off from the
monkey virus (SIV), or, much, much further back in evolution, when the line that led to the
human species split off from the one leading to the chimpanzee.
The molecular clock is known not to be perfect, because different genes tend to change at
different rates, though the rates were not that dissimilar. So it was always assumed that,
averaged over the whole genome, the molecular clock would give relatively accurate results;
particularly, as it seemed, until quite recently, the genome is full of "junk DNA" of unknown
function.

"Ultraconservative elements"
Many surprises lay in store as genome sequences accumulated and, thankfully, get deposited
into one public database, so useful comparisons could be made. It turns out that not only are
there vast hidden treasures among the "junk DNA", but evidence of highly non-random
changes among different stretches of the DNA, some of which change in concert, some or
which change at random, and others, change almost not at all.
There are 481 segments in the human genome longer than 200 bp that are 100% identical with
rat and mouse genomes. Nearly all are also conserved in the chicken (467/481) and dog
(477/481) genomes, with an average of 95.7% and 99.2% identity, respectively. Many are
also significantly conserved in fish (324/481 at an average of 76.8% identity).
Very few of these elements could be traced back to jelly fish, Drosophila or the nematode
worm.
These "ultraconserved" elements are widely distributed in the genome, occurring on all
chromosomes with the exception of the Y chromosome and chromosome 21. They most often
overlap exons in genes involved in RNA processing or in their introns; or near genes involved
in regulation of transcription and development.
Of the 481 ultraconserved elements, 111 overlap the mRNA of a known human protein
coding gene, including the UTR (untranslated region) and are partly exonic (belonging to
protein coding sequences); 256 show no match to expressed mRNA and are therefore
nonexonic (non-protein coding); while the remaining 114 are possibly exonic. One hundred of

20

the non-exonic elements are located in introns (non-coding intervening sequences) of known
genes and the rest are intergenic (between genes). The non-exonic elements, both intronic and
intergenic, tend to congregate in clusters near transcription factors and developmental genes,
whereas the exonic and possibly exonic elements are more randomly distributed along the
chromosomes.
There are 93 known genes that overlap with exonic ultraconserved elements; these are called
type 1 genes. The 255 genes that are near the non-exonic elements are type II genes. Type I
genes tend to be RNA binding or involved in regulation of splicing. In contrast, type II genes
are involved in regulation of transcription and DNA binding, and are enriched for DNA
binding motifs such as the homeobox.
Nonexonic ultraconserved elements are often found in "gene deserts" that extend more than a
megabase. Of the non-exonic elements, there are 140 that are more than 10Kb away from any
known gene, and 88 that are more than 100Kb away.
The set of 156 annotated genes that flank intergenic ultraconserved elements is significantly
enriched for developmental genes, and in particular, genes involved in early development,
suggesting that many of the associated ultraconserved elements may be distal enhancers of
these early developmental genes.
Non-exonic elements that lie in introns are also often associated with developmental genes.
Many elements in the ultraconservative set of 481 are considerably longer than 200bp. The
longest elements (779bp, 770bp and 731 bp) all lie in the last three introns in the 3' portion of
the DNA polymerase alpha catalytic subunit on chromosome X, along with other shorter
ultraconserved elements.
If the criterion "highly conserved" sequences with 99% identity (instead of 100% identity) is
used, then there are 1 974 elements, of lengths up to 1 087bp in the human genome.
There are also 5 000 sequences of more than 100bp in length that are 100% identical in the
human, rat and mouse genomes. These appear to be essential for development in mammals
and other vertebrates.
Tens of thousands more are found at lower cutoffs.
Thus, as much as 5% of the genome is more conserved than expected from neutral mutations
occurring at random.

Ultra-conserved elements are indispensable


Researchers from the University of California Santa Cruz in the United States and University
of Queensland, Brisbane, Australia, suggest these sequences are under negative "purifying"
selection for more than 300 million years, some for at least 400 million years; or else they
have very low mutation rates, or they are subject to perfect repair. It means they must be 'vital'
for survival.
The rate at which these sequences change in evolution is 20 fold less than the rest of the
genome, including the protein coding regions.

21

The ultraconserved elements show almost no natural variation in the human population. Only
6 out of 106 767 bp examined are at validated SNPs, whereas 119 are expected.

Surprise, surprise
But researchers revealed that mice with big chunks for such ultraconserved sequences deleted
get on very well without them.
Edward Rubin's team at the Lawrence Berkeley National Laboratory in California deleted two
huge regions of DNA from mice containing nearly 1 000 highly conserved sequences shared
between human and mice. One region was 1.6 million DNA bases long, the other over
800,000 bases long. The researchers expected the mice to show big problems as the result of
the deletions.
But the mutant mice were no different from normal mice in every respect: growth, metabolic
functions, lifespan and overall development. "We were quite amazed," said Rubin, who
presented the findings at a meeting of the Cold Spring Harbor Laboratory in New York earlier
this year.
"It may say as much about our inability to detect any phenotypes as it says about the function
of this region, " said David Haussler of the University of California, Santa Cruz, whose team
described the "ultra-conserved regions" in mammals, "What's most mysterious is that we don't
know any molecular mechanism that would demand conservation like this."
Article first published 16/09/04

Sources
Bejerano G, Pheasant M, Makunin I, Stephen S, Kent WJ, Mattick JS and Haussler D.
Ultraconserved elements in the human genome. Sciencexpress/ www.sciencexpress.org/6May
2004/ Page1/10.1126/science.1098119
"Life goes on without 'vital DNA" Sylvia Pagn, New Scientist 3 June 2004
www.Newscientist.com

22

How to Keep in Concert


One of the biggest puzzles of the fluid genome is why multiple copies of a gene scattered
throughout the genome can be kept so nearly identical, which may be good for the organism.
But the mechanism responsible has its downside in converting healthy genes to defective ones
when cells are stressed. Dr. Mae-Wan Ho reports

The mystery of perfect copies


Multigene families are families of genes that serve the same function and are almost
identical copies of one another. Multigene families exist in the genome of all living
organisms, and are present either in blocks of repeats, or in single copies dispersed throughout
the genome.
One question that has preoccupied geneticists right from the first is how the multiple copies of
gene sequence remains so uniform within a species, which is out of all proportion to
expectations based on the rate of random mutations that strike most other parts of the species
genome, and much more so compared to the same gene sequence present between species.
The members of multigene families seem to evolve in concert within a species.
The ribosomal RNA (rRNA) genes - required for protein synthesis in the ribosomes within the
cell - are the best-studied examples of concerted evolution in eukaryotes (organisms
including human beings, whose genomes are enclosed within a nucleus); and gene conversion
has been proposed as a mechanism, especially for genes that are dispersed throughout the
genome. Gene conversion is a process whereby the sequence of one gene converts that of
another in the genome, so the end result is a closer resemblance between them.
Large numbers of rRNA genes are present in eukaryote, typically more than 100, and in some
cases more than 1 000; and the size of the repeated unit also tends to be very big. This makes
precise analysis very difficult. The repeat unit of human rRNA genes, for example, is about
43kb; and there are five blocks of tandem repeats of about 100, each on a different
chromosome.

A closer look
Microbiologist Liao Daiqing of the University of Sherbrooke in Quebec, Canada, compared
the sequences of multiple rRNA genes within the genome of 12 bacteria that have multiple
copies of the rRNA genes. The genes for the three rRNA molecules (23S, 15S and 5S) found
in the ribosome are typically linked together and transcribed in a single unit called an operon
in prokaryotes. The length of these three rRNA genes is ~2 900bp (23S), ~1 500bp (16S), and
~120 bp (5S), and their sizes as well as sequences are well conserved between different
prokaryotic species. The multiple rRNA operons (muti-gene units under the same
transcription control) are generally dispersed throughout the prokaryotic genome. Liao
analysed the rRNA genes and their immediate flanking sequences in 19 completely sequenced
genomes, but seven of the genomes surveyed contain only one copy of each rRNA gene.
He found striking sequence homogeneity of each individual rRNA gene family within a
species, in contrast to the divergence of gene sequences between species.

23

Within a genome, evidence of gene conversion was found throughout the entire length of each
individual rRNA genes and their immediate flanking regions. Individual conversion events,
however, convert only a short sequence tract, and the conversion partner can be any gene
within the gene family in the genome. He confirmed that gene sequences undergo much
slower divergence than their flanking sequences, and any homogeneous flanking regions that
exist may have been incidental co-conversion with the gene sequence.
The average divergence (difference) among the seven 16S rRNA genes present in E. coli is
0.0055 per site, whereas the average divergence between the 16S RNA genes in E. coli and its
close relative H. influenze is 0.1325, or 24 times greater. The same applies to the 23S and 5S
rRNA genes. No sequence heterogeneity was detected for multiple copies of 23S, 16S or 5S
in Aquifex aeolicus, Chlamydia trachomatis, Haemophilus influenze, Helicobacter pylori,
Methanobacterium thermoautotrophicum and Synechocystis PCC6803. Five of these six
species have only two rRNA operons, whereas there are six operons in H. influenzae. There
are 10 and 7 rRNA operons in B. subtilis and E. coli, but the rRNA genes in these two species
also display remarkable sequence homogeneity.
Obvious sequence heterogeneity was found for the intergenic spacer sequences between 16S
and 23S genes in B. subtilis, E. coli, H. influenzae and T. pallidum. This is mainly due to the
presence or absence of tRNA (transfer RNA) genes or the presence of different tRNA genes
in this intergenic region. The contrast of homogeneity in the gene sequences to heterogeneity
in the intergenic spacers implies that concerted evolution does not reflect gross replacement of
one operon with another; rather it is a gradual, region-by-region homogenisation process.
Individual conversion tracts appear to be short, apparently less than 500bp, similar to those
observed in other organisms.

How genes may convert


Several mechanisms can lead to sequence conversion. The first is via reverse transcriptase
(RT) of a rRNA sequence into complementary rDNA, which is then inserted in place of other
rRNA genes in the genome. The second mechanism involves recombination between different
rRNA genes during DNA replication, so they end up with the same sequence or more similar
sequences. The third mechanism involves the invasion of one gene by the single stranded
DNA of another gene to form a hybrid duplex, followed by DNA repair to remove the
mismatch.
The first two mechanisms are considered unlikely in prokaryotes. Although RT-mediated
gene conversion appears to occur in the eukaryote yeast, RT activity cannot be detected in
many different types of cells including E. coli. Unequal reciprocal recombination can in
principle account for homogenisation of tandemly repeated genes. However, that could not
satisfactorily explain the remarkable heterogeneity of sequences flanking the rRNA genes.
Furthermore, ectopic recombination between repetitive sequences in different parts of the
genome can result in sequence deletion, inversion or translocation and such drastic genomic
changes lead to genome instability.
So that leaves gene conversion via heteroduplex formation, probably mediated by the
complex bacterial enzyme RecBCD that controls recombination at particular Chi
(pronounced "Kye") recombination hotspots, with the sequence GCTGGTGG (see box). The
Chi element is one of the most abundant repeated sequences in the E. coli genome. Chi-like

24

sequences are frequently found within the 16S and 23S rRNA genes and their vicinities. For
example, the sequence stretch GCTGGCGG near the 5 end of the 16S rRNA gene differs
from Chi by only one nucleotide, and this change does not appear to affect its function. This
Chi sequence is conserved in all bacterial 16S rRNA genes. Although RecBCD/Chi system
may not operate in all the species, similar recombination machinery may be responsible.

A universal gene converter?


The E. coli RecBCD enzyme is a multifunctional protein complex (330 kDa) containing three
subunits, the products of the recB, recC, and recD genes. This enzyme displays four distinct
activities: nuclease, helicase, ATPase, and site-specific recognition of the DNA regulatory
sequence, Chi. RecBCD enzyme is responsible for the seemingly disparate functions of
DNA degradation and repair of the bacterial chromosome. The former function is achieved by
the combined action of its helicase and nuclease activities, whereas a recombinationally
activated form accomplishes the latter, after RecBCD interacts with Chi. RecBCD is a
principal component of the main pathway for homologous genetic recombination in E. coli.
Structural or functional anlogs of the RecBCD enzyme are present in many bacteria.
The nuclease activity protects the cell from invasion by viral DNA, although bacteriophages
(bacterial viruses) that infect E. coli have developed strategies to overcome the nuclease
activity by producing proteins that bind to the RecBCD to inhibit its activity or that caps the
end of the genome, to prevent RecBCD from entering it.
Chi is a DNA sequence of eight nucleotides (5-GCTGGTGG-3) that stimulates the
frequency of recombination in its vicinity by 5 to 10 fold over background levels. It was
originally discovered as a mutation in phage that protected its genome from degradation by
RecBCD enzyme. The effect of Chi is highly oriented, with the region of enhanced
recombination extending downstream of the 5 end of the Chi sequence, decreasing by a
factor of two for every 2.2 to 3.2 kilobases, returning to background levels 10 kilobases
downstream. All recombination stimulated by this site requires the activity of the RecBCD
enzyme, and only if the enzyme approaches Chi from the 3 side.
A model involves Rec BCD enzyme entering a dsDNA end to unwind the duplex, while
preferentially degrading the strand corresponding to the 3-terminus at the point of entry.
Single-strand binding (SSB) protein binds the single stranded ssDNA produced. When
RecBCD encounters a Chi sequence, however, the 3 to 5 nuclease activity is attenuated and
a weaker 5 to 3 nuclease activity is activated on the opposite strand. Following the
interaction with Chi, degradation of the strand corresponding to the 3 end is attenuated at
least 500-fold. This attenuation of nuclease activity is manifest until the enzyme dissociates
from the DNA, explaining the elevated recombination frequency downstream of Chi sites.
RecBCD enzyme facilitates the loading of RecA protein onto the ssDNA produced by the
continued translocation and unwinding of the DNA molecule beyond the Chi site. The RecA
protein-coated ssDNA filament then invades a homologous DNA molecule, and converts it by
a DNA repair mechanism that removes the mismatch in the invaded copy.
This mechanism is believed to be responsible for 80% of recombination events following
conjugation. Any dsDNA breaks, similarly, is repaired recombinationally by RecBCD. Repair
is facilitated by the abundant presence of Chi sites, occurring approximately once every 4.6
kilobases in the E. coli genome; with 75.5% of the sites oriented toward the origin of
replication.

25

Gene conversion in health & disease


Evidence for gene conversion via heteroduplex formation has emerged in other bacteria and in
yeast. Analysis of the RNU2 gene in various human populations reveals that repeats within an
individual tandemly repeated array are more homogeneous than between different arrays,
while the intergenic flanking regions are not homogeneous, suggesting that gene conversion is
involved instead.
Chi-like sequences have been found in many eukaryotic genomes and are suspected to be
involved in gene conversion events, for example, within the MHC (Major Histocompatibility
Complex), a complex of around 100 gene in vertebrates, include the extremely polymorphic
(variable) cell surface proteins called HLA in humans and H-2 in mice, which provide
immunological markers for self, and are involved in immune response against nonself,
including transplants.
Alec Jeffreys and Celia May at Leicester University examined human sperm for evidence of
gene conversion. The formation of germ cells egg and sperm - during meiosis is the usual
point in the life-cycle of higher organisms when chromosomes pair up, cross-over and
exchange parts, thereby shuffling the genes they inherit from each parent. But it appears that
instead of an equal exchanging of parts at the cross over points, there is an unequal conversion
of one allele by the other.
Jeffreys and May first concentrated on a recombination hotspot DNA3 located in the MHC,
which is surrounded by single nucleotide polymorphism (SNP), with many men heterozygous
for multiple SNPs. They found evidence of gene conversion - 1.3-3.4 x 10-3 per sperm that
was two to three times higher than the rate of crossover. All conversions involve the transfer
of short stretches of DNA (300bp to 1091 bp). Conversion rates declined rapidly with distance
and defined a very steep gradient extending in each direction from the centre of the hotspot.
Another crossover hotspot DMB2 in the MHC was much less active than DNA3, but the
pattern of gene conversion was very similar. A third crossover hotspot is the gene SHOX in
the pseudo-autosomal pairing region PAR1 on the sex chromosomes. The crossover rate is
much higher (3.7x10-3) per sperm, although the density of SNP is low. Again, there is
evidence of gene conversion involving short tracts of DNA.
The mean length of conversion tracts probably lies in the range of 55-290bp. They estimate
that somewhere between 80% and 94% of recombinations at hotspot DNA3 are gene
conversions rather than reciprocal cross-overs.
Similar results have been observed earlier in mice. The number of crossovers during meiosis
is tightly regulated to one to two per pair of chromosomes in mice, and their distribution is not
random, there are recombination hot and cold regions. Researchers in the Institute of Human
genetics, Montpellier, France, found a high frequency of gene conversion in the region of
highest crossover density. They found 16 gene conversion events among 6 000 molecules of
sperm DNA, corresponding to a frequency of 2.7 x 10-3. Most of the gene conversion events
involve less than 540bp tracts.
Gene conversion is increasingly implicated in human disease, in which the disease-causing
mutations appear to be copied from a closely related pseudogene (a mutated gene that is no
longer functional) in the genome. Cases attributed to Chi sequences include the T4 cationic

26

trypsinogen gene associated with pancreatitis, the -crystallin gene CRYBB2 in a dominant
form of cataracts, the CYP21B gene responsible for steroid 21-hydroxylase deficiency and
congenital adrenal hyperplasia and Von Willebrand disease (VWD), the commonest inherited
bleeding disorder. Such pathological gene conversions may be linked to stress, and resemble
the controversial phenomenon of directed mutations found in stressed and starving bacterial
cells (see "To mutate or not to mutate", this series).
Avoiding stress may be much more important for health than inheriting good genes.
Article first published 20/09/04

Sources
1. Liao D. Gene conversion drives within genic
sequences: concerted evolution of ribosomal
RNA genes in Bacteria and Archaea. J Mol
Evol 2000, 51, 305-17.
2. Arnold DA and Kowalczykowski SC.
RecBCE helicase/nuclease. Encyclopaedia of
Life Sciences, Macmillan, 1998.
3. Martinsohn J Th, Sousa AB, Gujethlein LA,
Howard JC. The gene conversion hypothesis
of MHC evolution: a review. Immunogentica
1999, 50, 168-200.
4. Dorak MT. Common terms in evolutionary
biology and genetics.
http://dorakmt.tripod.com/mhc/glossary.html
5. Jeffreys AJ and May CA. Intense and highly
localized gene conversion activity in human
meiotic crossover hot spots. Nature Genetics
2004, 36, 151-6.
6. Guillon H and de Massy B. An initiation site
for meiotic crossing-over and gene
conversion in the mouse. Nature Genetics
2002, 32, 296-9.
7. Kppers R and Dalla-Favara R. Mechanisms
of chromosomal translocation in B cell
lymphomas. Oncogene 2001, 20, 5580-94.
8. Chen J-M, Raguenes O, Ferec C, Deprez PH
and Verellen-Dumoulin C. A CG>CAT gene
conversion-like event resulting in the R122H
mutation in the cationic trypsinogen gene and
its implication in the genotyping of
pancreatitis. J Med Genet 2000, 37
(http://jmedgenet.com/cgi/content/full/37/11/e
36)
9. Virinder Sarhadi V, Reis A, Jung M, Singh D,
Sperling K, Singh JR and Brger J. A unique

27

form of autosomal dominant cataract


explained by gene conversion between crystallin B2 and its pseudogene. J Med
Genet 2001, 38, 392-6.
10. Amor M, Parker KL, Globerman H, New MI
and White PC. Mutaion in the CYP12B (Ile172-Asn) causes steroid 21-hydroxylase
deficiency. Proc Natl Acad Sci USA 1988, 85,
1600-4.
11. Surdhar GK, Enayat MS, Lawson S, Williams
MD and Hill FGH. Homozygous gene
conversion in von Willebrand factor gene as a
cause of type 3 von Willebrand disease and
predisposition to inhibitor development.
Blood 2001, 98, 248-50.

28

Das könnte Ihnen auch gefallen