Sie sind auf Seite 1von 29

14/02/03

LECTURE PRESENTATIONS For BROCK BIOLOGY OF MICROORGANISMS, THIRTEENTH EDITION Michael T. Madigan, John M. Martinko, David A. Stahl, David P. Clark!

Chapter 12

Microbial Genomes
Lectures by John Zamora Middle Tennessee State University
2012 Pearson Education, Inc.

12.1 Introduction to Genomics


" Genome
" Entire complement of genetic information " Includes genes, regulatory sequences, and noncoding DNA

" Genomics
" Discipline of mapping, sequencing, analyzing, and comparing genomes

2012 Pearson Education, Inc.

14/02/03

12.1 Introduction to Genomics


" >2,000 prokaryotic genomes sequenced or in progress " RNA virus MS2
" First genome sequenced in 1976 " 3,569 bp

" Haemophilus influenzae


" First cellular genome sequenced in 1995 " 1,830,137 bp

2012 Pearson Education, Inc.

12.2 Sequencing and Annotating Genomes


" Sequencing: determining the precise order of nucleotides in a DNA or RNA molecule " Sanger dideoxy method
" Invented by Nobel Prize winner Fred Sanger " Dideoxy analogs of dNTPs used in conjunction with dNTPs (Figure 12.1) " Analog prevents further extension of DNA chain (Figure 12.2a) " Bases are labeled with radioactivity " Gel electrophoresis is then performed on products (Figure12.2b)
2012 Pearson Education, Inc.

14/02/03

Figure 12.1

Base

Base

Missing OH Dideoxy analog Normal deoxynucleotide DNA chain Base

Direction of chain growth

Base

No free 3!-OH, replication will stop at this point


2012 Pearson Education, Inc.

Figure 12.2a

DNA strand to be sequenced

Radioactive DNA primer

Add DNA polymerase, mixture of all four deoxyribonucleotide triphosphates; separate into four reaction tubes

A small amount of only one dideoxynucleotide triphosphate (ddGTP, ddATP, ddTTP, or ddCTP) added to each tube and reaction allowed to proceed

Reaction products ddGTP ddATP ddTTP ddCTP

2012 Pearson Education, Inc.

14/02/03

Figure 12.2b

7 6 5 4 3 2 1 Sequence reads from bottom of gel as A G C T A A G. Sequence of unknown is 3! T C G A T T C 5! Reaction products separated by electrophoresis on gel and identified by autoradiography

2012 Pearson Education, Inc.

12.2 Sequencing and Annotating Genomes


" Large-scale sequencing projects have led to automated DNA sequencing systems
" Based on Sanger method " Radioactivity replaced by fluorescent dye (Figure 12.2c)

2012 Pearson Education, Inc.

14/02/03

Figure 12.2c

2012 Pearson Education, Inc.

12.2 Sequencing and Annotating Genomes


" Virtually all genomic sequencing projects use shotgun sequencing
" Entire genome is cloned and resultant clones are sequenced " Much of the sequencing is redundant " Generally 7- to 10-fold coverage

" Computer algorithms used to look for replicate sequences and assemble them

2012 Pearson Education, Inc.

14/02/03

12.2 Sequencing and Annotating Genomes


" Occasionally assembly is not possible " Closure can be pursued using PCR to target areas of the genome " Closed vs. draft genome
" Closed genome relies on manpower " More expensive " More information

2012 Pearson Education, Inc.

12.2 Sequencing and Annotating Genomes


" Annotation: converting raw sequence data into a list of genes present in the genome " Majority of genes encode proteins " Functional ORF: an open reading frame that encodes a protein
" Computer algorithms used to search for ORFs
" Look for start/stop codons and ShineDalgarno sequences

" ORFs can be compared to ORFs in other genomes


2012 Pearson Education, Inc.

14/02/03

12.2 Sequencing and Annotating Genomes


" Inaccuracies in some annotations are problematic
" As many as 10% of annotated genes are incorrectly annotated

2012 Pearson Education, Inc.

12.3 Bioinformatic Analyses and Gene Distributions


" Bioinformatics
" Science that applies powerful computational tools to DNA and protein sequences " For the purpose of analyzing, storing, and accessing the sequences for comparative purposes

2012 Pearson Education, Inc.

14/02/03

12.3 Bioinformatic Analyses and Gene Distributions


" Correlation between genome size and ORFs (Figure 12.3) " On average a prokaryotic gene is 1,000 bp long
" " 1,000 genes per megabase (1 Mbp = 1,000,000 bp) " As genome size increases, gene content proportionally increases

2012 Pearson Education, Inc.

Figure 12.3

9000 8000 Total ORFs in genome 7000 6000 5000 4000 3000 2000 1000 0 0 1 2 3 4 5 6 7 8 9 10

Genome size (megabases)

2012 Pearson Education, Inc.

14/02/03

12.3 Bioinformatic Analyses and Gene Distributions


" Prokaryotic genomes range in size from those of large viruses to those of eukaryotic microbes " Unlike prokaryotes, eukaryotic genomes contain a large fraction of noncoding DNA

2012 Pearson Education, Inc.

12.3 Bioinformatic Analyses and Gene Distributions


" Smallest cellular genomes belong to parasitic or endosymbiotic prokaryotes
" Obligate parasites range from 490 kbp (Nanoarchaeum equitans) to 4,400 kbp (Mycobacterium tuberculosis) " Endosymbionts can be smaller (e.g., 160-bp genome of Carsonella ruddii) " Estimates suggest the minimum number of genes for a viable cell is 250300 genes
2012 Pearson Education, Inc.

14/02/03

12.3 Bioinformatic Analyses and Gene Distributions


" Largest prokaryotic genomes comparable to those of some eukaryotes
" Sorangium cellulosum (Bacteria)
" Largest prokaryotic genome to date at 12.3 Mbp

" Largest archaeal genomes tend to be smaller (~5 Mbp)

2012 Pearson Education, Inc.

12.3 Bioinformatic Analyses and Gene Distributions


" Complement of genes in a particular organism defines its biology, but genomes are also molded by an organism s lifestyle

2012 Pearson Education, Inc.

10

14/02/03

12.3 Bioinformatic Analyses and Gene Distributions


" Many genes can be identified by sequence similarity to genes found in other organisms (comparative analysis) " Comparative analyses allow for predictions of metabolic pathways and transport systems " Example:Thermotoga maritima (Figure 12.4)

2012 Pearson Education, Inc.

Figure 12.4

Peptide ABC transport systems

Glucose Gluconate
PENTOSE PHOSPHATE PATHWAY

ENTNER DOUDOROFF PATHWAY

Branched-chain amino acids Glycine Acetamine Threonine NH3 + CO2 + H2 Amino acids Polyamines

Glucose-6-P

6-Phosphogluconate KDPG Gly-3-P + Pyruvate

Glycolysis

Sugar ABC transport systems

Fructose-6-P Gly-3-P

Phosphate
Glycerol 33 flagellar &

DHAP

Glycerol-3-P

PEP

motor genes

Flagellum

Aspartate

Malate

Pyruvate

Oxalacetate Valine

Aspartate

cheA/B/C/D/R/W/Y 7 MCPs Zinc

H2 and CO2 OR #-Ketoglutarate Aldehydes Ketoisovalerate

Acetyl-CoA Lactate Histidine Glutamate Proline Glutamine Leucine PRPP

Iron Ribose-5-P Chemotactic signals Cations

Ribose Maltose Glycerol 3-P H+ ATP synthase Glycerol Uracil uptake NH4+ K+ Fe3+ Na+

Cations

2012 Pearson Education, Inc.

11

14/02/03

12.3 Bioinformatic Analyses and Gene Distributions


" Gene Distribution in Prokaryotes
" Metabolic genes typically most abundant class " DNA replication and transcription genes make up minor fraction of genome " Nontranslated RNA genes are typically prevalent
" Include rRNA, tRNA, small regulatory RNAs

2012 Pearson Education, Inc.

12.3 Bioinformatic Analyses and Gene Distributions


" Number of genes with role that can be clearly identified in a given genome is 70% or less of total ORFs detected " Hypothetical proteins: uncharacterized ORFs; proteins that likely exist but whose function is presently unknown
" Likely encode nonessential genes " In E. coli, many predicted to encode regulatory or redundant proteins
2012 Pearson Education, Inc.

12

14/02/03

12.3 Bioinformatic Analyses and Gene Distributions


" Percentage of an organism s genes devoted to a specific cell function is to some degree a function of genome size (Figure 12.5)

2012 Pearson Education, Inc.

Figure 12.5

Relative percent of ORFs

DNA replication Translation Transcription Signal transduction Energy generation

2000

4000

6000

8000

10,000

Total ORFs in genome


2012 Pearson Education, Inc.

13

14/02/03

12.3 Bioinformatic Analyses and Gene Distributions


" Gene Distribution in Bacteria and Archaea
" Archaea typically devote a higher percentage of their genomes to energy and coenzyme production than do Bacteria " Archaea contain fewer genes for carbohydrate metabolism or cytoplasmic membrane functions than do Bacteria

2012 Pearson Education, Inc.

Figure 12.6

14 Percent of genes 12 10 8 6 4 2 0 Carbohydrate metabolism Bacteria Archaea

Cell membrane

Coenzyme metabolism

Energy production

Unknown function

General prediction

Functional category

2012 Pearson Education, Inc.

14

14/02/03

12.6 Metagenomics
" Metagenome
" The total gene content of the organisms present in an environment

" Several environments have been surveyed by large-scale metagenome projects


" Examples: acid mine runoff waters, deep-sea sediments, fertile soils

2012 Pearson Education, Inc.

12.10 Gene Families, Duplications, and Deletions


" Homologous: related in sequence to an extent that implies common genetic ancestry " Gene families: groups of gene homologs (Figure 12.15) " Paralogs: genes within an organism whose similarity to one or more genes in the same organism is the result of gene duplication " Orthologs: genes found in one organism that are similar to those in another organism but differ because of speciation
2012 Pearson Education, Inc.

15

14/02/03

Figure 12.15

Paralogs Paralogs Paralogs Paralogs Orthologs A1 A2 Orthologs B1 B2

Species 1 Species 2

Species 2 Species 1

Divergence of species Paralogs Gene A Gene B

Gene duplication Ancestral gene in ancestral species


2012 Pearson Education, Inc.

Genomics in TB Research

16

14/02/03

Cole, S. T. et al. (1998). Nature, 393(6685), 537544. doi:10.1038/31159

The outer circle shows the scale in Mb, with 0 representing the origin of replication. The first ring from the exterior denotes the positions of stable RNA genes (tRNAs are blue, others are pink) and the direct repeat region (pink cube); the second ring inwards shows the coding sequence by strand (clockwise, dark green; anticlockwise, light green); the third ring depicts repetitive DNA (insertion sequences, orange; 13E12 REP family, dark pink; prophage, blue); the fourth ring shows the positions of the PPE family members (green); the fifth ring shows the PE family members (purple, excluding PGRS); and the sixth ring shows the positions of the PGRSsequences (dark red). The histogram (centre) represents G + C content, with <65% G + C in yellow, and >65% G + C in red. The figure was generated with software from DNASTAR.

M. tuberculosis Genome
" The genome comprises:
" 4,411,529 base pairs " 4,000 genes " a very high GC content that is reflected in the biased amino-acid content of the proteins

" M. tuberculosis differs radically from other bacteria in that;


" A very large portion of its coding capacity is devoted to the production of enzymes involved in lipogenesis and lipolysis. " Two new families of glycine-rich proteins with a repetitive structure that may represent a source of antigenic variation.

17

14/02/03

Biased Amino-acid Content of the Proteins


" GTG initiation codons (35%) are used more frequently than in Bacillus subtilis (9%) and E. coli (14%), although ATG (61%) is the most common translational start. " Statistically significant preference for the amino acids Ala, Gly, Pro, Arg and Trp, which are all encoded by G + C-rich codons " A comparative reduction in the use of amino acids encoded by A + T- rich codons such as Asn, Ile, Lys, Phe and Tyr

Lipogenesis and Lipolysis


" ~250 distinct enzymes involved in fatty acid metabolism in M. tuberculosis compared with only 50 in E. coli. " 36 acyl-CoA synthases. " Two discrete types of fatty acid biosynthesis system, fatty acid synthase (FAS) I and FAS II. " Polyketides " Siderophores

18

14/02/03

Immunological aspects and pathogenicity


" ~90 Lipoproteins " Two copies of secA " About 10% of the coding capacity of the genome is devoted to two large unrelated families of acidic, glycine-rich proteins, the PE and PPE families, whose genes are clustered
" the principal source of antigenic variation in what is otherwise a genetically and antigenically homogeneous bacterium? " these glycine-rich proteins might interfere with immune responses by inhibiting antigen processing?

Potential Antibiotic Resistance Mechanisms


" Hydrolytic or drug-modifying enzymes such as beta-lactamases and aminoglycoside acetyl transferases. " Many potential drug efflux systems, such as 14 members of the major facilitator family and numerous ABC transporters.

19

14/02/03

Cole, S. T., et al. (2001). Massive gene decay in the leprosy bacillus. Nature, 409(6823), 10071011. doi: 10.1038/35059006

What is Leprosy?
" Chronic human neurological disease " Results from infection with the obligate intracellular pathogen Mycobacterium leprae " M. leprae is a close relative of the tubercle bacillus. " M. leprae has the longest doubling time of all known bacteria (a doubling time of ~14 days) " M. leprae has thwarted every effort at culture in the laboratory.

20

14/02/03

Features of M. leprae Genome


" The 3.27 Mb genome sequence from an armadillo-derived Indian isolate of the leprosy bacillus. " Substantially smaller than that of M. tuberculosis (4.41 Mb). " Reveals an extreme case of reductive evolution. " Less than half of the genome contains functional genes but pseudogenes, with intact counterparts in M. tuberculosis, abound. " Genome downsizing and the current mosaic arrangement appear to have resulted from extensive recombination events between dispersed repetitive sequences. " Gene deletion and decay have eliminated many important metabolic activities including siderophore production, part of the oxidative and most of the microaerophilic and anaerobic respiratory chains, and numerous catabolic systems and their regulatory circuits.

Pseudogenes

21

14/02/03

1, small-molecule catabolism; 2, energy metabolism; 3, central intermediary metabolism; 4, amino-acid biosynthesis; 5, nucleoside and nucleotide biosynthesis and metabolism; 6, biosynthesis of cofactors, prosthetic groups and carriers; 7, lipid biosynthesis; 8, polyketide and non-ribosomal peptide synthesis; 9, proteins performing regulatory functions; and so on.

22

14/02/03

It is striking that elimination of pseudogenes by deletion lags far behind gene inactivation. But why???

Garnier, T., et al. (2003). Proceedings of the National Academy of Sciences of the United States of America, 100(13), 78777882. doi: 10.1073/pnas.1130426100

23

14/02/03

The genome sequence of M. bovis is >99.95% identical to that of M. tuberculosis, but deletion of genetic information has led to a reduced genome size.

Brosch, R., et al. (2002). Proceedings of the National Academy of Sciences of the United States of America, 99(6), 36843689. doi: 10.1073/pnas.052548299

24

14/02/03

Proposed origin
M. tuberculosis derived from M. bovis
M. bovis

M. tuberculosis

Or was it?
Cole, Institut Pasteur, Downloaded from www.pasteur.fr.

RD distribution in M. tuberculosis!
M. can.!
RD 12!

M. tub.!
TbD1!

M. afri.!
RD 9 ! RD3 ($ Rv1)!

M. mic.!
RD 9 ! RD 7 ! RD 8 ! RD 10 ! RD 5! RD3 ($ Rv1)!

M. bov.!
RD 9 ! RD 7 ! RD 8 ! RD10 ! ! RD 4 ! RD 5 ! RD12 ! RD13 ! RD11 ($ Rv2)!

BCG!
RD 9 ! RD 7 ! RD 8 ! RD10 ! ! RD 4 ! RD 5 ! RD12 ! RD13 ! ! RD 1 ! RD 2!

RD3 ($ Rv1)! RD11 ($ Rv2)!

Cole, Institut Pasteur, Downloaded from www.pasteur.fr.

25

14/02/03

Evolutionary scenario
Numerous sequence ! polymorphisms!

RDcan!

M. canettii!
modern! ancestral!

Common ancestor of the! M. tuberculosis complex!

RD 9!

TbD 1!
katG 463 CTG%CGG! gyrA95AGC%ACC

M. tuberculosis!

RD 7! RD 8! RD 10!
mmpL6 551AAC%AAG!

M. africanum!
RDmic! RDseal!

Text
oxyR

M. microti!
sealisol.! oryxisol.! goat-isol.! classical! BCG Tokyo !

285 G%A!

RD 12! RD 13!
pncA 57CAC%GAC!

RD 4! RD 1! RD 2! RD 14!

M. bovis!

BCG Pasteur!

Cole, Institut Pasteur, Downloaded from www.pasteur.fr.

Evolution of the M. tb complex!

M. bovis

X
M. tuberculosis

Cole, Institut Pasteur, Downloaded from www.pasteur.fr.

26

14/02/03

Evolution of the M. tb complex

M. bovis

M. tuberculosis

Progenitor bacillus Cole, Institut Pasteur, Downloaded from www.pasteur.fr.

" More than 3 billion individuals have been immunized with bacillus CalmetteGuerin (BCG), an attenuated derivative of M. bovis. " BCG is part of the WHOs Expanded Program on Immunization because of its proven efficacy at preventing extrapulmonary tuberculosis in children. " In adults, its efficacy against pulmonary disease is variable, possibly as a result of environmental, operational, demographic, and genetic factors. " Comparative genome and transcriptome analysis of Mycobacterium bovis BCG Pasteur 1173P2.

27

14/02/03

Major Findings
" The 4,374,522-bp genome contains 3,954 protein-coding genes, 58 of which are present in two copies as a result of two independent tandem duplications, DU1 and DU2. " DU1 is restricted to BCG Pasteur. " DU2-I is confined to early BCG vaccines, like BCG Japan " DU2-III and DU2-IV occur in the late vaccines. " The glycerol-3-phosphate dehydrogenase gene, glpD2, is one of only three genes common to all four DU2 variants, implying that BCG requires higher levels of this enzyme to grow on glycerol. " Further amplification of the DU2 region is ongoing, even within vaccine preparations used to immunize humans. " Furthermore, the combined findings suggest that early BCG vaccines may even be superior to the later ones that are more widely used.

28

14/02/03

The Beijing family


" Appears to be more virulent, more transmissible & associated with MDR

TRENDS in Microbiology Vol.10 No.1 January 2002 45-52

29

Das könnte Ihnen auch gefallen