Sie sind auf Seite 1von 17

Chapter 21

Genomes and Their Evolution


Lecture Outline
Overview: Reading the Leaves from the Tree of Life
The advent of techniques for mapping genomes by rapid, complete genome sequencing enabled scientists to sequence the human genome by 2003 and the genome of the chimp, Pan troglodytes, by 2005. Scientists can no as! hat differences in genetic information account for the distinct characteristics of humans and chimps. "esearchers have also completed genome sequences for Escherichia coli and numerous other pro!aryotes, Saccharomyces cerevisiae #bre er$s yeast%, Caenorhabditis elegans #nematode%, Drosophila melanogaster #fruit fly%, Mus musculus #mouse%, Macaca mulatta #macaque%, and others. &ragments of '() have been sequenced from e*tinct species, including the oolly mammoth. +omparing the genomes of more distantly related animals should reveal the sets of genes that control group,defining characteristics. +omparing the genomes of plants and pro!aryotes provides information about the long evolutionary history of shared ancient genes and their products. -ith the genomes of many species fully sequenced, scientists can study hole sets of genes and their interactions, an approach called genomics. The sequencing efforts that contribute to this approach generate enormous volumes of data. The need to deal ith this information has spa ned the field of bioinformatics, the application of computational methods to the storage and analysis of biological data.

Concept 21 1 !ew approaches have accelerated the pace of genome se"uencing


The #uman Genome $ro%ect #./0% began in 1220, ith the goal of sequencing the human genome.

The Human Genome Project used a three-stage approach to mapping the human genome. The starting point for the ./0 as an incomplete picture of the organi3ation of many genomes. /eneticists had !aryotypes for many species, sho ing the number and banding pattern of chromosomes. The locations of some genes had been identified by fluorescence in situ hybridi3ation #&4S.%, in hich fluorescently labeled probes hybridi3e to an immobili3ed array of
2 !

Lecture Outline for Campbell/Reece Biology, 8th Edition, Pearson Education, Inc.

chromosomes. The initial stage in the three,stage approach to mapping the human genome as to construct a lin&age map of several thousand genetic mar!ers spaced throughout each of the chromosomes. The order of the mar!ers and the relative distances bet een them on such a map are based on recombination frequencies. The mar!ers can be genes or any other identifiable sequences in the '(), such as restriction fragment length polymorphisms #"&50s% or simple tandem repeats #ST"s%. 6y 1222, researchers had compiled a human genetic map ith about 5,000 mar!ers, enabling them to locate genes by testing for genetic lin!age to !no n mar!ers. The second stage as the physical mapping of the human genome. 4n a ph'sical map, the distances bet een mar!ers are e*pressed by some physical measure, usually the number of base pairs along the '(). ) physical map is made by cutting the '() of each chromosome into a number of restriction fragments and then determining the original order of the fragments in the chromosomal '(). The !ey is to ma!e fragments that overlap, to identify the overlaps, and then to assign fragments to a sequential order that corresponds to their order in a chromosome. Supplies of the '() fragments used for physical mapping are prepared by '() cloning. The first cloning vector is often a yeast artificial chromosome #7)+%, hich can carry inserted fragments a million base pairs long, or a bacterial artificial chromosome #6)+%, hich carries inserts of 100,0008300,000 base pairs. )fter these long fragments are ordered, each fragment is cut into smaller pieces, hich are cloned in plasmids or phages, ordered in turn, and finally sequenced. The third stage in mapping a genome as to determine the complete nucleotide sequence of each chromosome. The sequencing of all 3.2 billion base pairs in a haploid set of human chromosomes presented a formidable challenge. This challenge as met by sequencing machines, using the dideo*y chain,termination method. The development of technology for faster sequencing has accelerated the rate of sequencing dramatically9from 1,000 base pairs a day in the 12:0s to 1,000 base pairs per second in 2000. ;ethods that can analy3e biological materials very rapidly and produce enormous volumes of data are said to be <high,throughput=> sequencing machines are an e*ample of high,throughput devices.

The whole-genome shotgun method was adopted in the 1990s. 4n 1222, molecular biologist ?. +raig @enter proposed that the sequencing of hole genomes should start directly ith the sequencing of random '() fragments, s!ipping the genetic mapping and physical mapping stages. 0o erful computer programs ould then assemble the resulting very large number of overlapping short sequences into a single continuous sequence. 4n 1225, @enter and his colleagues reported the first complete genome sequence of an organism, the bacterium Haemophilus influenzae. 4n ;ay 122:, @enter set up a company, +elera /enomics, and declared his intention to
2 !2

Lecture Outline for Campbell/Reece Biology, 8th Edition, Pearson Education, Inc.

complete the human genome sequence. 4n ;arch 2000, +elera /enomics completed the genome sequence of D. melanogaster. 4n )pril 2003, t o years before the original target date of the .uman /enome 0roAect, the human genome sequence as announced Aointly by +elera and the public consortium. +elera had !ept its data private, hereas the data of the public consortium ere made available to all. +elera$s accomplishment relied heavily on the consortium$s maps and sequence data. (evertheless, @enter argues for the efficiency and economy of +elera$s methods. 6oth approaches led to the rapid completion of genome sequencing for a number of species.

Today, the hole,genome shotgun method is idely used The '() fragments are cloned into three different vectors, each of hich ta!es a defined si3e of insert. The computer uses the !no n distance bet een the ends of the inserted '(), along ith other information, to assemble the sequences. ) recent study comparing the hole,genome shotgun method ith the three,stage approach found that the hole,genome shotgun method can miss some duplicated sequences, thus underestimating the si3e of the genome and missing some genes in those regions. The hybrid approach that ended up being used for the human genome, ith the more rapid shotgun sequencing augmented by some mapping of clones, may be the most useful. Some gaps still remained ith the publication of the detailed sequence of the final chromosome in 200B. 6ecause of the presence of repetitive '(), certain parts of the chromosomes of multicellular organisms resist detailed mapping by the usual methods.

Concept 21 2 (cientists use bioinformatics to anal')e genomes and their functions


The goals of the .uman /enome 0roAect included establishing databases and refining analytical soft are, both of hich are centrali3ed and readily accessible on the 4nternet. 6ioinformatics resources are available to researchers orld ide, speeding up the dissemination of information.

Centrali ed resources are a!aila"le #or anal$ ing genome se%uences. 4n the Cnited States, the (ational 5ibrary of ;edicine and the (ational 4nstitutes of .ealth Aointly created the (ational +enter for 6iotechnology 4nformation #(+64%, hich maintains a ebsite ith e*tensive bioinformatics resources. Similar ebsites have been established by the Duropean ;olecular 6iology 5aboratory and the '() 'ata 6an! of ?apan. Smaller ebsites maintained by individual labs or groups of labs provide databases and soft are designed for narro er purposes, such as studying genetic and genomic changes in one particular type of cancer. The (+64 database of sequences is called /enban!. )s of ?une 200E, /enban! contained the sequences of E3 million fragments of genomic '(), totaling EE billion base pairs.

Lecture Outline for Campbell/Reece Biology, 8th Edition, Pearson Education, Inc.

2 !"

The amount of data in /enban! is estimated to double every 1: months.

65)ST, a soft are program available on the (+64 ebsite, allo s visitors to compare a '() sequence to every sequence in /enban!, in order to locate similar regions. )nother program allo s the comparison of predicted protein sequences. ) third program searches protein sequences for common stretches of amino acids #domains% and generates a three,dimensional model of the domain. There is even a soft are program that can compare a collection of sequences of nucleic acids or polypeptides, and diagram them in the form of an evolutionary tree based on the sequence relationshipsF The (+64 ebsite maintains a database of all three,dimensional protein structures that have been determined.

Protein-coding genes can "e identi#ied within &'( se%uences. .o can geneticists recogni3e protein,coding genes from '() sequences and determine their functionG Soft are is used to scan '() sequences for transcriptional and translational start and stop signals, for "(),splicing sites, and for other signs of protein,coding genes. Soft are also loo!s for certain short sequences that correspond to sequences present in !no n m"()s. Thousands of such sequences, called expressed se uence tags, or ES!s, have been collected from c'() sequences and are cataloged in computer databases. The identities of about half of the human genes ere !no n before the .uman /enome 0roAect began. +lues about the identities of previously un!no n genes come from comparing the sequences of gene candidates ith those of !no n genes from other organisms. o 'ue to redundancy in the genetic code, the '() sequence may vary more than the protein sequence does. Scientists compare the predicted amino acid sequence of a protein ith that of other proteins. Sometimes a ne ly identified sequence matches, at least partially, the sequence of a gene or protein hose function is ell !no n. 4f part of a ne gene matches a !no n gene that encodes an important signaling path ay protein such as a protein !inase, then the ne gene may, too. Some sequences are entirely unli!e anything ever seen before. This as true for about a third of the genes of E. coli hen its genome as sequenced. 4n these genes, function as deduced through a combination of biochemical and functional studies. The biochemical approach aims to determine the three,dimensional structure of the protein as ell as other attributes such as binding sites for other molecules. &unctional studies disable the gene to see hat effect that has on the phenotype.

Genes and their products can "e understood at the s$stems le!el. /enomics is a rich source of ne insights into fundamental questions about genome organi3ation, regulation of gene e*pression, gro th and development, and evolution. The success in sequencing genomes and studying entire sets of genes has encouraged
2 !#

Lecture Outline for Campbell/Reece Biology, 8th Edition, Pearson Education, Inc.

scientists to attempt similar systematic study of the full protein sets "proteomes# encoded by genomes, an approach called proteomics. 6iologists have begun to compile catalogs of genes and proteins9listings of all the <parts= that contribute to the operation of cells, tissues, and organisms. Csing these catalogs, researchers have shifted their attention from the individual parts to their functional integration in biological systems. Hne basic application of the systems biology approach is to define gene circuits and protein interaction net or!s. To map a protein interaction net or! in D. melanogaster, researchers started ith more than 10,000 predicted "() transcripts. "esearchers used molecular techniques to test interactions bet een the hole or partial protein products of these transcripts. Csing statistical tests to select the interactions for hich the data ere strongest, researchers ended up ith roughly I,E00 proteins that appeared to participate in more than I,000 interactions. The +ancer /enome )tlas is another e*ample of systems biology in hich a large group of interacting genes and gene products is analy3ed together. The (ational +ancer 4nstitute and the (ational 4nstitutes of .ealth aim to understand ho changes in biological systems lead to cancer. 4n a three,year pilot proAect running from 200E to 2010, researchers are analy3ing three types of cancer9lung, ovarian, and glioblastoma of the brain9by comparing gene sequences and patterns of gene e*pression in cancer cells ith those in normal cells. ) set of appro*imately 2,000 genes from the cancer cells ill be sequenced during the progression of the disease, to monitor changes due to mutations and rearrangements. The /ene+hip is a microarray containing most of the !no n human genes. The /ene+hip is being used to analy3e gene e*pression patterns in patients suffering from various cancers and other diseases, ith the eventual aim of tailoring their treatment to their unique genetic ma!eup and the specifics of their cancers. Cltimately, all of us may carry ith our medical records a catalog of our '() sequence, a sort of genetic bar code, ith regions highlighted that predispose us to specific diseases.

Concept 21 * Genomes var' in si)e+ number of genes+ and gene densit'


6y the summer of 200E, the sequencing of more than B00 genomes had been completed and the sequencing of more than 2,100 genomes as in progress. Hf the completely sequenced group, about 500 are genomes of bacteria and I5 are archaeal genomes. )mong the B5 eu!aryotic species in the group are vertebrates, invertebrates, and plants. The accumulated genome sequences contain a ealth of information that e are Aust beginning to mine. Comparing "acteria) archaea) and eu*ar$otes shows a general progression #rom smaller to larger genomes. ;ost bacterial genomes have bet een 1 and B million base pairs #;b%> the genome of E. coli$ for instance, has I.B ;b. /enomes of archaea are generally ithin the si3e range of bacterial genomes.
2 !$

Lecture Outline for Campbell/Reece Biology, 8th Edition, Pearson Education, Inc.

Du!aryotic genomes tend to be largerJ The genome of the single,celled yeast S. cerevisiae has about 13 ;b, hereas most multicellular animals and plants have genomes ith at least 100 ;b. There are 1:0 ;b in the fruit fly genome, hereas human genomes have 3,200 ;b. ) comparison of genome si3es among eu!aryotes does not sho any systematic relationship bet een genome si3e and phenotype. The genome of %ritillaria assyriaca, a flo ering plant in the lily family, contains 120 billion base pairs #120,000 ;b%, about I0 times more than the human genome. ) single,celled amoeba, &moeba dubia, has a genome ith BE0 billion bases. #4t has not yet been sequenced.% The cric!et genome has 11 times as many base pairs as D. melanogaster. There is a ide range of genome si3es ithin the groups of proto3oans, insects, amphibians, and plants and less of a range ithin mammals and reptiles. +acteria and archaea ha!e #ewer genes than eu*ar$otes. &ree,living bacteria and archaea have 1,5008E,500 genes, hereas the number of genes in eu!aryotes ranges from about 5,000 for unicellular fungi to at least I0,000 for multicellular eu!aryotes. -ithin eu!aryotes, the number of genes in a species is often lo er than e*pected, considering the si3e of the genome. The genome of the nematode C. elegans has 100 ;b and contains roughly 20,000 genes. The D. melanogaster genome, in contrast, is almost t ice as big #1:0 ;b% but has about t o,thirds the number of genes9only 13,E00 genes.

)t the outset of the .uman /enome 0roAect, biologists e*pected to identify bet een 50,000 and 100,000 genes based on the number of !no n human proteins. )s the proAect progressed, the estimate as revised do n ard several times, and, as of 200E, the most reliable count is 20,I::. This lo er number, similar to the number of genes in the nematode C. elegans, has surprised biologists. .o do humans #and other vertebrates% get by ith no more genes than a nematodeG @ertebrate genomes use e*tensive alternative splicing of "() transcripts. This process generates more than one functional protein from a single gene. (early all human genes contain multiple e*ons, and an estimated E5K of these multi,e*on genes are spliced in at least t o different ays. 4f each alternatively spliced human gene on average specifies three different polypeptides, then the total number of different human polypeptides is about E5,000. )dditional polypeptide diversity can result from post,translational modifications.

Gene densities !ar$. /ene density is the number of genes present in a given length of '(). /enerally, eu!aryotes have larger genomes but lo er gene density than pro!aryotes. .umans have hundreds or thousands of times as many base pairs in their genome as most bacteria, but only 5815 times as many genes9thus, the gene density is lo er. Dven unicellular eu!aryotes, such as yeasts, have fe er genes per million base pairs than
2 !%

Lecture Outline for Campbell/Reece Biology, 8th Edition, Pearson Education, Inc.

bacteria and archaea. )mong the genomes that have been sequenced completely, mammals such as humans seem to have the lo est gene density. 4n bacterial genomes, most of the '() consists of genes for protein, t"(), or r"(). (ontranscribed regulatory sequences, such as promoters, ma!e up only a small amount of the '(). 6acterial genes lac! introns. ;ost eu!aryotic '() does not code for protein and is not transcribed into functional "() molecules #such as t"()s%. Du!aryotic '() includes more comple* regulatory sequences. 4n fact, humans have 10,000 times as much noncoding '() as bacteria. Some of the '() in multicellular eu!aryotes is present as introns ithin genes. 4ntrons account for most of the difference in average length bet een human genes #2E,000 base pairs% and bacterial genes #1,000 base pairs%.

Concept 21 , -ulticellular eu&ar'otes have much noncoding .!/ and man' multigene families
The coding regions of protein,coding genes and the genes for "() products such as r"(), t"(), and mi"() ma!e up only a small portion of the genomes of most multicellular eu!aryotes. ;ost of a eu!aryotic genome consists of '() sequences that don$t code for proteins or produce !no n "()s> this noncoding '() has often been described in the past as <Aun! '().= &ar from Aun!, ho ever, this '() plays important roles in the cell, e*plaining hy it has persisted in diverse genomes over many hundreds of generations. +omparisons of the genomes of humans, rats, and mice revealed almost 500 regions of noncoding '() that ere identical in sequence in all three species. These sequences are more highly conserved than protein,coding genes in these species, supporting the vie that the noncoding regions play important roles. Hnly 1.5K of the human genome codes for proteins or produces r"()s and t"()s. /ene,related regulatory sequences and introns account for 2IK of the human genome. The rest, located bet een functional genes, includes unique noncoding '() such as gene fragments and pseudogenes, nonfunctional former genes that have accumulated mutations over a long time. ;ost intergenic '() is repetitive .!/, sequences that are present in multiple copies in the genome. Three,quarters of this repetitive '() #IIK of the entire human genome% is made up of units called transposable elements and sequences related to them.

Transposa"le elements can mo!e #rom one location to another within the genome. 6oth pro!aryotes and eu!aryotes have stretches of '() that can move from one location to another ithin the genome. These stretches are !no n as transposable genetic elements, or simply transposable elements.

Lecture Outline for Campbell/Reece Biology, 8th Edition, Pearson Education, Inc.

2 !&

'uring transposition, a transposable element moves from one site in a cell$s '() to another by a recombination process. Transposable elements ere called <Aumping genes,= but in fact, they never completely detach from the cell$s '(). The original and ne '() sites are brought together by '() bending. The first evidence for transposable elements came from )merican geneticist 6arbara ;c+lintoc!$s breeding e*periments ith 4ndian corn #mai3e% in the 12I0s and 1250s. )s she trac!ed corn plants through multiple generations $ ;c+lintoc! found color changes in corn that could be e*plained only by the e*istence of movable genetic elements. The elements moved into genes for !ernel color, disrupting the genes so that they could no longer produce color. ;c+lintoc!$s discovery as met ith great s!epticism. .er or! as validated many years later, ho ever, hen transposable elements ere found in bacteria and microbial geneticists learned more about the molecular basis of transposition. Du!aryotic transposable elements are of t o typesJ transposons and retrotransposons. Transposons move ithin a genome by means of a '() intermediate by a <cut,and,paste= mechanism, hich removes the element from the original site, or by a <copy,and,paste= mechanism, hich leaves a copy behind ;ost transposable elements are retrotransposons, hich move by means of an "() intermediate. "etrotransposons al ays leave a copy at the original site during transposition because they are initially transcribed into an "() intermediate. To insert at another site, the "() intermediate is first converted bac! to '() by reverse transcriptase, an en3yme encoded in the retrotransposon itself. ) cellular en3yme cataly3es insertion of the reverse,transcribed '() at a ne site.

"etroviruses, hich use reverse transcriptase to produce their '(), may have evolved from retrotransposons. ,ultiple copies o# transposa"le elements and se%uences related to them are scattered throughout eu*ar$otic genomes.

Transposable elements and related sequences are usually hundreds to thousands of base pairs long, and the dispersed <copies= are similar but not identical. These sequences ma!e up 25850K of most mammalian genomes and even higher percentages of the genomes of amphibians and many plants. 4n humans and other primates, a large portion of transposable element8related '() consists of a family of similar sequences called &lu elements. These sequences account for appro*imately 10K of the human genome. &lu elements are about 300 nucleotides long, much shorter than most functional transposable elements, and they do not code for any protein. ;any &lu elements are transcribed into "() molecules, although their cellular function, if any, is un!no n. )n even larger percentage #1EK% of the human genome is made up of a type of retrotransposon called '()E*+, or '+. These sequences are about B,500 base pairs long and have a slo rate of transposition. Sequences ithin 51 bloc! "() polymerase, hich is necessary for transposition.
2 !8

Lecture Outline for Campbell/Reece Biology, 8th Edition, Pearson Education, Inc.

51 sequences have been found ithin the introns of nearly :0K of the human genes analy3ed, suggesting that 51 may help regulate gene e*pression. "esearchers have proposed that 51 retrotransposons may affect the genome differently in separate developing neurons, thus contributing to the great diversity of neuronal cell types. )lthough many transposable elements encode proteins, these proteins do not carry out normal cellular functions.

-ome repetiti!e &'( is not related to transposa"le elements. "epetitive '() that is not related to transposable elements probably arose by mista!es during '() replication or recombination. Such '() accounts for about 15K of the human genome. )nother 5K of the human genome consists of duplications of long stretches of '(), ranging from 10,000 to 300,000 base pairs. The large segments have been copied from one chromosomal location to another site on the same or a different chromosome. (imple0se"uence .!/ contains many copies of tandemly repeated short sequences. "epeated units may contain as many as 500 or as fe as 15 nucleotides. -hen the unit contains 285 nucleotides, the series of repeats is called a short tandem repeat, or (TR. The number of copies of the repeated unit can vary from site to site ithin a given genome. The repeat number can also vary from person to person, producing the variation used for genetic profiling by ST" analysis. )ltogether, simple,sequence '() ma!es up 3K of the human genome. Simple,sequence '() has an intrinsically different density from the rest of the cell$s '(). 4f genomic '() is cut into pieces and centrifuged at high speed, segments of different density migrate to different positions in the centrifuge tube. "epetitive '() isolated in this ay as originally called satellite D)& because it appeared as a <satellite= band in the centrifuge tube, separate from the rest of the '(). This term is often used interchangeably ith simple*se uence D)&. ;uch of a genome$s simple,sequence '() is located at chromosomal telomeres and centromeres, suggesting that this '() plays a structural role for chromosomes. The '() at centromeres is essential for the separation of chromatids in cell division. +entromeric '(), along ith simple,sequence '() located else here, may also help organi3e the chromatin ithin the interphase nucleus. The simple,sequence '() located at telomeres, at the tips of chromosomes, prevents genes from being lost as the '() shortens ith each round of replication. Telomeric '() also binds proteins that protect the ends of a chromosome from degradation and from Aoining to other chromosomes.

Gene-related &'( ma*es up a"out ./0 o# the human genome. '() sequences that code for proteins or produce t"() or r"() ma!e up 1.5K of the human genome. 4f introns and regulatory sequences associated ith genes are included, the total amount of
2 !'

Lecture Outline for Campbell/Reece Biology, 8th Edition, Pearson Education, Inc.

gene,related '()9coding and noncoding9constitutes about 25K of the human genome. ;any eu!aryotic genes are present as unique sequences, ith only one copy per haploid set of chromosomes. 4n most eu!aryotic genomes, such solitary genes ma!e up less than half the total transcribed '(). The rest of the transcribed '() occurs in multigene families, collections of t o or more identical or very similar genes. 4n multigene families consisting of identical '() sequences, the sequences are usually clustered tandemly. D*cept for genes that code for histone proteins, these families have "()s as final products. The genes for the three largest r"() molecules are a family of identical '() sequences. These r"() molecules are transcribed from a single transcription unit, repeated tandemly hundreds to thousands of times in one or several clusters in the genome of a multicellular eu!aryote. The many copies of this r"() transcription unit help cells to quic!ly ma!e the millions of ribosomes needed for active protein synthesis. The primary transcript is cleaved to yield the three r"() molecules. These molecules are then combined ith proteins and one other !ind of r"() #5S r"()% to form ribosomal subunits. The classic e*amples of multigene families of nonidentical genes are t o related families of genes that encode globins, a group of proteins that include the and polypeptide subunits of hemoglobin. Hne family, located on chromosome 1B in humans, encodes various forms of ,globin> the other, on chromosome 11, encodes forms of ,globin. The different forms of each globin subunit are e*pressed at different times in development, allo ing hemoglobin to function effectively in the changing environment of the developing animal. 4n humans, the embryonic and fetal forms of hemoglobin have a higher affinity for o*ygen than the adult forms, thus ensuring the efficient transfer of o*ygen from mother to fetus.

Concept 21 1 .uplication+ rearrangement+ and mutation of .!/ contribute to genome evolution


The earliest forms of life li!ely had a minimal number of genes, including only those necessary for survival and reproduction. The si3e of genomes has increased over evolutionary time, ith the e*tra genetic material providing ra material for gene diversification. )n accident in meiosis can result in one or more e*tra sets of chromosomes, a condition !no n as polyploidy. 4n rare cases, the polyploidy condition can facilitate the evolution of genes. 4n a polyploid organism, one set of genes can provide essential functions for the organism. The genes in the e*tra set may diverge by accumulating mutations. These variations may persist if the organism carrying them survives and reproduces.
2 ! (

Lecture Outline for Campbell/Reece Biology, 8th Edition, Pearson Education, Inc.

4n this ay, genes ith novel functions may evolve.

The accumulation of mutations may lead to the branching off of a ne species, as happens often in plants. Scientists can compare the chromosomal organi3ations of many different species to ma!e inferences about the evolutionary processes shaping chromosomes and possibly leading to speciation. "esearchers performed a computer analysis of '() sequences to reconstruct the evolutionary history of chromosomal rearrangements in eight mammalian species. The researchers found many duplications and inversions of large portions of chromosomes. The rate of these events seems to have accelerated about 100 million years ago, around the time large dinosaurs became e*tinct and the number of mammalian species increased rapidly. Such chromosomal rearrangements are thought to contribute to the generation of ne species. )lthough t o individuals ith different arrangements could still mate and produce offspring, the offspring ould have t o nonequivalent sets of chromosomes, ma!ing meiosis inefficient or even impossible. 'ue to chromosome rearrangements, the t o populations could not successfully mate ith each other, a step on their ay to becoming t o separate species. )fter the ancestors of humans and chimpan3ees diverged as species, the fusion of t o ancestral chromosomes in the human line led to different haploid numbers for humans # n L 23% and chimpan3ees #n L 2I%. )nother pattern ith medical relevance as notedJ The chromosomal brea!age points associated ith the rearrangements ere not randomly distributed> specific sites ere used over and over again. ) number of these recombination <hotspots= correspond to locations of chromosomal rearrangements ithin the human genome that are associated ith congenital diseases. Drrors during meiosis can lead to the duplication of smaller chromosomal regions, including segments that are about the length of individual genes. Cnequal crossing over during prophase 4 can result in one chromosome ith a deletion and another ith a duplication of a particular gene. Transposable elements in the genome can provide sites here nonsister chromatids can cross over, even hen their homologous gene sequences are not correctly aligned. Slippage during '() replication can result in the deletion or duplication of '() regions. Such errors can lead to regions of repeats, such as simple,sequence '(). Dvidence that unequal crossing over and template slippage during '() replication lead to duplication of genes is found in the e*istence of multigene families. 'uplication events have led to the evolution of genes ith related functions, such as the , globin and ,globin gene families. ) comparison of gene sequences ithin a multigene family indicates that they all evolved from one common ancestral globin gene, hich as duplicated and diverged about I508 500 million years ago. Dach of these genes as later duplicated several times, and the copies then diverged from each other in sequence, yielding the current family members. The ancestral globin gene also gave rise to the o*ygen,binding muscle protein myoglobin
2 !

Lecture Outline for Campbell/Reece Biology, 8th Edition, Pearson Education, Inc.

and to the plant protein leghemoglobin. The latter t o proteins function as monomers, and their genes are included in a <globin superfamily.=

)fter the duplication events, the differences bet een the genes in the globin family arose from mutations that accumulated in the gene copies over many generations. The necessary function provided by an ,globin protein as fulfilled by one gene, hile other copies of the ,globin gene accumulated random mutations. Some mutations may have altered the function of the protein product in ays that ere beneficial to the organism ithout changing its o*ygen,carrying function. The similarity in the amino acid sequences of the various ,globin and ,globin proteins supports this model of gene duplication and mutation. The e*istence of several pseudogenes among the functional globin genes provides additional evidence for this model. "andom mutations accumulating over time in the pseudogenes have destroyed their function. 4n other gene families, one copy of a duplicated gene can undergo alterations that lead to a completely ne function for the protein product. The genes for lyso3yme and ,lactalbumin are good e*amples. 5yso3yme is an en3yme that helps prevent infection by hydroly3ing bacterial cell alls> , lactalbumin is a nonen3ymatic protein that plays a role in mammalian mil! production. 6oth genes are found in mammals, but only lyso3yme is found in birds. The t o proteins are similar in their amino acids sequences and three,dimensional structures. &indings suggest that at some time after the bird and mammalian lineages had separated, the lyso3yme gene under ent a duplication event in the mammalian lineage but not in the avian lineage. Subsequently, one copy of the duplicated lyso3yme gene evolved into a gene encoding , lactalbumin, a protein ith a completely different function. "earrangement of e*isting '() sequences ithin genes has also contributed to genome evolution. The presence of introns in eu!aryotic genes may have promoted the evolution of ne and potentially useful proteins by facilitating the duplication or repositioning of e*ons in the genome. ) particular e*on ithin a gene could be duplicated on one chromosome and deleted from the homologous chromosome. The gene ith the duplicated e*on ould code for a protein ith a second copy of the encoded domain. This change in the protein$s structure could augment its function by increasing its stability or altering its ability to bind a particular ligand. ) number of protein,coding genes have multiple copies of related e*ons, hich presumably arose by duplication and then diverged. The gene coding for collagen is a good e*ample. +ollagen is a structural protein ith a highly repetitive amino acid sequence, hich is reflected in the repetitive pattern of e*ons in the collagen gene.

Lecture Outline for Campbell/Reece Biology, 8th Edition, Pearson Education, Inc.

2 ! 2

The mi*ing and matching of different e*ons ithin or bet een genes o ing to errors in meiotic recombination is called exon shuffling and could lead to ne proteins ith novel combinations of functions. The gene for tissue plasminogen activator #T0)%, an e*tracellular protein that helps control blood clotting, has four domains of three types, each encoded by an e*on> one e*on is present in t o copies. 6ecause each type of e*on is also found in other proteins, the gene for T0) is thought to have arisen by several instances of e*on shuffling and duplication. The persistence of transposable elements as a large percentage of eu!aryotic genomes suggests that they play an important role in shaping a genome over evolutionary time. Transposable elements can contribute to the evolution of the genome by promoting recombination, disrupting cellular genes or control elements, and carrying entire genes or individual e*ons to ne locations. The presence of transposable elements ith similar sequence scattered throughout the genome allo s recombination to ta!e place bet een different chromosomes ith homologous regions. ;ost of these alterations are li!ely detrimental, causing chromosomal translocations and other changes in the genome that may be lethal to the organism. Hver the course of evolutionary time, ho ever, an occasional recombination may be advantageous. The movement of transposable elements around the genome can have direct consequences. 4f a transposable element <Aumps= into the middle of a coding sequence of a protein, coding gene, it may prevent the normal functioning of that gene. 4f a transposable element inserts ithin a regulatory sequence, it may increase or decrease protein production. 'uring transposition, a transposable element may transfer genes to a ne position on the genome. This process probably accounts for the location of the ,globin and ,globin gene families on different human chromosomes. ) similar mechanism may insert an e*on from one gene into another gene. 4f the inserted e*on is retained in the "() transcript during "() splicing, the protein that is synthesi3ed ill have an additional domain, hich may confer a ne function. Transposable elements can lead to ne coding sequences hen an &lu element hops into introns to create a ea! alternative splice site in the "() transcript. Splicing usually occurs at the regular splice sites, producing the original protein. Hccasionally, splicing occurs at the ne ea! site. 4n this ay, alternative genetic combinations can be <tried out= hile the function of the original gene product is retained. These processes produce either no effect or harmful effects in most individual cases. Hver long periods of time, ho ever, the generation of genetic diversity provides more ra material for natural selection to or! on during evolution. The accumulation of changes in the genome of each species provides a record of its evolutionary history. +omparing the genomes of different species enables scientists to identify genomic changes and has increased our understanding of ho genomes evolve.
2 ! "

Lecture Outline for Campbell/Reece Biology, 8th Edition, Pearson Education, Inc.

Concept 21 2 Comparing genome se"uences provides clues to evolution and development


Comparisons o# genome se%uences #rom di##erent species tell a"out the e!olutionar$ histor$ o# li#e. The more similar in sequence the genes and genomes of t o species, the more closely related those species are in their evolutionary history. +omparing the genomes of closely related species provides information about recent evolutionary events> comparing the genomes of distantly related species sheds light on ancient evolutionary history. )naly3ing highly conserved genes in distantly related species can help clarify evolutionary relationships among species that diverged long ago. +omparisons of the complete genome sequences of bacteria, archaea, and eu!aryotes strongly support the theory that these groups are the three fundamental domains of life. /enes that evolved a very long time ago can still be surprisingly similar in disparate species. Several protein,coding genes in yeast are so similar to some human disease genes that researchers deduced the functions of the disease genes by studying their yeast counterparts. The genomes of closely related species are li!ely to be organi3ed similarly because of their relatively recent divergence. The fully sequenced genome of one species can thus be used as a scaffold for assembly of genomic sequences from a closely related species, accelerating mapping of the second genome. &or instance, using the human genome sequence as a guide, researchers ere able to sequence the mouse genome very quic!ly. The genetic differences bet een closely related species can be correlated ith phenotypic differences. "esearchers have compared the human genome ith the genomes of the chimpan3ee, mouse, rat, and other mammals. 4dentifying the genes shared by these species but not by nonmammals provides clues about hat it ta!es to ma!e a mammal. 4dentifying the genes shared by chimpan3ees and humans but not by rodents gives information about primates. +omparing the human genome ith that of the chimpan3ee helps ans er the questionJ -hat genomic information ma!es a human or a chimpan3eeG 4n single,base substitutions, chimp and human genomes differ by only 1.2K. 5onger stretches of '() sho a 2.EK difference due to insertions or deletions of larger regions in the genome of one or the other species. ;any of the insertions are duplications or other repetitive '(). ) third of the human duplications are not present in the chimpan3ee genome. Some of these duplications contain regions associated ith human diseases. There are more &lu elements in the human genome than in the chimpan3ee genome, and the latter contains many copies of a retroviral provirus not present in humans. 6iologists have identified a number of genes that are apparently evolving faster in humans
2 ! #

Lecture Outline for Campbell/Reece Biology, 8th Edition, Pearson Education, Inc.

than in either the chimpan3ee or the mouse. These include genes involved in defense against malaria and tuberculosis and at least one gene regulating brain si3e. The genes evolving the fastest in humans are those that code for transcription factors. Transcription factors regulate gene e*pression and thus play a !ey role in orchestrating the overall genetic program. Hne transcription factor hose gene sho s evidence of rapid change in the human lineage is called &HM02. Several lines of evidence suggest that this gene functions in vocali3ation in vertebrates. ;utations in this gene can produce severe speech and language impairment in humans. The %,-P. gene is e*pressed in the brains of 3ebra finches and canaries at the time hen these songbirds are learning their songs. 4n a <!noc!,out= e*periment, ?oseph 6u*baum and colleagues disrupted the %,-P. gene in mice and analy3ed the resulting phenotype. .omo3ygous mutant mice had malformed brains and failed to emit normal ultrasonic vocali3ations, hile mice ith one faulty copy of the gene had significant problems ith vocali3ation. "esearchers are e*ploring hether differences bet een the human and chimpan3ee &HM02 proteins account for the ability of humans, but not chimpan3ees, to communicate by speech. There are only t o amino acid differences bet een the human and chimpan3ee &HM02 proteins> the effect of these differences on the function of the human protein is not yet !no n. )nalysis of genomes is increasing our understanding of genetic variation in humans. 6ecause the history of the human species is so short9probably about 200,000 years9the amount of '() variation among humans is small compared to that of many other species. ;uch human genetic diversity seems to be in the form of single nucleotide polymorphisms #S(0s%, usually detected by '() sequencing. 4n the human genome, S(0s occur on average about once in 1008300 base pairs. Scientists have identified the location of several million S(0 sites in the human genome. Hther variations9including inversions, deletions, and duplications9seem to occur ithout ill effect on the individual carrying them. These variations ill be useful genetic mar!ers for studying human evolution, the differences bet een human populations, and the migratory routes of human populations throughout history. 0olymorphisms in human '() ill also be valuable mar!ers for identifying genes that cause diseases or affect our health in more subtle ays. )nalysis of the differences in individual genomes is li!ely to change the practice of medicine in the 21st century. Comparati!e studies o# the genetic programs that direct em"r$onic de!elopment clari#$ the mechanisms that generated the great di!ersit$ o# li#e. 6iologists in the field of evolutionary developmental biology, or evo0devo, compare the developmental processes of multicellular organisms ith the goal of understanding ho these processes have evolved and ho changes in them can modify e*isting organismal features or
2 ! $

Lecture Outline for Campbell/Reece Biology, 8th Edition, Pearson Education, Inc.

lead to ne ones. The genomes of related species ith stri!ingly different forms may have only minor differences in gene sequence or regulation. &or e*ample, homeotic genes in Drosophila specify the identity of body segments in the fruit fly. ;olecular analysis of the homeotic genes in Drosophila has sho n that they all include a 1:0,nucleotide sequence called a homeobo3, hich specifies a B0,amino,acid homeodomain in the encoded proteins. )n identical or very similar nucleotide sequence has been discovered in the homeotic genes of many invertebrates and vertebrates. The sequences are so similar bet een humans and fruit flies, in fact, that one researcher has himsically referred to flies as <little people ith ings.= The resemblance even e*tends to the organi3ation of these genesJ The vertebrate genes homologous to the homeotic genes of fruit flies have !ept their chromosomal arrangement. .omeobo*,containing sequences have been found in regulatory genes of much more distantly related eu!aryotes, including plants, yeasts, and even pro!aryotes. +learly, the homeobo* '() sequence evolved very early in the history of life and as sufficiently valuable to organisms to have been conserved in animals and plants virtually unchanged for hundreds of millions of years. .omeotic genes in animals ere named Hox genes, short for homeobox,containing genes, because homeotic genes ere the first genes found to have this sequence. Hther homeobo*,containing genes ere later found that do not act as homeotic genes and do not directly control the identity of body parts. ;ost of these genes are associated ith development, suggesting their ancient and fundamental importance in that process. 4n Drosophila, for e*ample, homeobo*es are present not only in the homeotic genes but also in the egg,polarity gene bicoid, in several segmentation genes, and in a master regulatory gene for eye development. The homeobo*,encoded homeodomain is the part of a protein that binds to '() hen the protein functions as a transcriptional regulator. .o ever, the shape of the homeodomain allo s it to bind to any '() segment> by itself it cannot select a specific sequence. Hther, more variable domains in a homeodomain,containing protein determine hich genes the protein regulates. 4nteraction of these latter domains ith still other transcription factors helps a homeodomain,containing protein recogni3e specific enhancers in the '(). 0roteins ith homeodomains probably regulate development by coordinating the transcription of batteries of developmental genes, s itching them on or off. 4n Drosophila embryos, different combinations of homeobo* genes are active in different parts of the embryo. Selective e*pression of regulatory genes, varying over time and space, is central to pattern formation. ;any other genes involved in development are highly conserved from species to species. These include numerous genes that encode components of signaling path ays.
2 ! %

Lecture Outline for Campbell/Reece Biology, 8th Edition, Pearson Education, Inc.

.o can the same genes be involved in the development of animals hose forms are so differentG 4n some cases, small changes in the regulatory sequences of particular genes cause changes in gene e*pression patterns that can lead to maAor changes in body form. The differing patterns of e*pression of the Hox genes along the body a*is in insects and crustaceans e*plain the variation in the number of leg,bearing segments among these segmented animals. The same Hox gene product may have different effects in different species, turning on ne genes or turning on the same genes at higher or lo er concentrations. Similar genes direct distinct developmental processes in specific organisms, resulting in different body shapes. Several Hox genes are e*pressed in the embryonic and larval stages of the sea urchin, a nonsegmented animal that has a body plan quite different from those of insects and mice. -imilarities in the molecular mechanisms o# de!elopment in plants and animals re#lect their shared cellular origin. The last common ancestor of animals and plants as a single,celled eu!aryote that lived hundreds of millions of years ago, so the processes of development must have evolved independently in the t o multicellular lineages of organisms. 0lants evolved ith rigid cell alls and do not sho the morphogenetic movements of cells and tissues that are so important in animals. ;orphogenesis in plants relies primarily on differing planes of cell division and on selective cell enlargement. 'espite the differences bet een animals and plants, there are similarities in their molecular mechanisms of development, hich are legacies of their shared cellular origin. 4n both animals and plants, development relies on a cascade of transcriptional regulators turning on or turning off genes in a finely tuned series. "esearch on a small flo ering plant in the mustard family, &rabidopsis thaliana, has sho n that establishing the radial pattern of flo er parts, li!e setting up the head,to,tail a*is in Drosophila, uses a cascade of transcription factors.

The genes that direct these processes differ considerably in plants and animals. ;any of the master regulatory s itches in Drosophila are homeobo*,containing Hox genes, hereas the s itches in &rabidopsis belong to a completely different family of genes, called the Mads*box genes. )lthough homeobo*,containing genes can be found in plants and Mads*box genes in animals, the genes do not perform the same maAor roles in development in both groups.

Thus, molecular evidence supports the supposition that developmental programs evolved separately in animals and plants.

Lecture Outline for Campbell/Reece Biology, 8th Edition, Pearson Education, Inc.

2 ! &

Das könnte Ihnen auch gefallen