Sie sind auf Seite 1von 8

articles

DNA sequence of both chromosomes of the cholera pathogen Vibrio cholerae


John F. Heidelberg*, Jonathan A. Eisen*, William C. Nelson*, Rebecca A. Clayton, Michelle L. Gwinn*, Robert J. Dodson*, Daniel H. Haft*, Tettelin*, Erin K. Hickey*, Jeremy D. Peterson*, Lowell Umayam*, Steven R. Gill*, Karen E. Nelson*, Timothy D. Read*, Herve Delwood Richardson*, Maria D. Ermolaeva*, Jessica Vamathevan*, Steven Bass*, Haiying Qin*, Ioana Dragoi*, Patrick Sellers*, Lisa McDonald*, Teresa Utterback*, Robert D. Fleishmann*, William C. Nierman*, Owen White *, Steven L. Salzberg*, Hamilton O. Smith*, Rita R. Colwell, John J. Mekalanos, J. Craig Venter* & Claire M. Fraser*
* The Institute for Genomic Research, 9712 Medical Center Drive, Rockville, Maryland 20850, USA Center of Marine Biotechnology, University of Maryland Biotechnology Institute, 701 East Pratt Street, Baltimore, Maryland 21202, USA, and Department of Cell and Molecular Biology, University of Maryland, College Park, Maryland 20742, USA Harvard Medical School, Department of Microbiology and Molecular Genetics, 200 Longwood Avenue, Boston, Massachusetts 02115, USA

............................................................................................................................................................................................................................................................................

Here we determine the complete genomic sequence of the Gram negative, g-Proteobacterium Vibrio cholerae El Tor N16961 to be 4,033,460 base pairs (bp). The genome consists of two circular chromosomes of 2,961,146 bp and 1,072,314 bp that together encode 3,885 open reading frames. The vast majority of recognizable genes for essential cell functions (such as DNA replication, transcription, translation and cell-wall biosynthesis) and pathogenicity (for example, toxins, surface antigens and adhesins) are located on the large chromosome. In contrast, the small chromosome contains a larger fraction (59%) of hypothetical genes compared with the large chromosome (42%), and also contains many more genes that appear to have origins other than the g-Proteobacteria. The small chromosome also carries a gene capture system (the integron island) and host `addiction' genes that are typically found on plasmids; thus, the small chromosome may have originally been a megaplasmid that was captured by an ancestral Vibrio species. The V. cholerae genomic sequence provides a starting point for understanding how a free-living, environmental organism emerged to become a signicant human bacterial pathogen.
Vibrio cholerae is the aetiological agent of cholera, a severe diarrhoeal disease that occurs most frequently in epidemic form1. Cholera has been epidemic in southern Asia for at least 1,000 years, but also spread worldwide to cause seven pandemics since 1817 (ref. 1). When untreated, cholera is a disease of extraordinarily rapid onset and potentially high lethality. Although clinical management of cholera has advanced over the past 40 years, cholera remains a serious threat in developing countries where sanitation is poor, health care limited, and drinking water unsafe. Vibrio cholerae as a species includes both pathogenic and nonpathogenic strains that vary in their virulence gene content2. This bacterium contains a wide variety of strains and biotypes, receiving and transferring genes for toxins3, colonization factors4,5, antibiotic resistance6, capsular polysaccharides that provide resistance to chlorine7 and new surface antigens, such as the 0139 lipopolysaccharide and O antigen capsule8,9. The lateral or horizontal transfer of these virulence genes by phage3, pathogenicity islands10,11 and other accessory genetic elements12 provides insights into how bacterial pathogens emerge and evolve to become new strains. Vibrio species represent a signicant portion of the culturable heterotrophic bacteria of oceans, coastal waters and estuaries13,14. Environmental studies show that these bacteria strongly inuence nutrient cycling in the marine environment. Various species of this genus are also devastating pathogens for nsh, shellsh and mammals. There is still much to be learned about the aquatic ecology and natural history of V. cholerae including its autochthonous (native) existence in endemic locales during cholera-free, interepidemic periods, which environmental factors, such as climate13,15, aided its re-emergence in Latin America, and which environmental factors are associated with its habitat in cholera endemic regions. For example, V. cholerae, during interepidemic periods, is an inhabitant
Present address: Celera Genomics, 45 West Gude Drive, Rockville, Maryland 20850, USA

of brackish and estuarine waters, and in these environments is associated with zooplankton and other aquatic ora and fauna16. The organism also enters a ``viable but nonculturable''17 state under certain conditions. Roles for these environmental interactions and this dormant physiological state in the emergence and persistence of pathogenic V. cholerae have been proposed14,17. Here we report the determination and analysis of the Vibrio cholerae genome sequence. This analysis represents an important step toward the complete molecular description of how this freeliving environmental organism emerged to become a human pathogen by horizontal gene transfer.

Genome analysis

The genome of V. cholerae was sequenced by the whole genome random sequencing method18. The genome consists of two circular chromosomes19,20 of 2,961,146 (chromosome 1) and 1,072,314
Table 1 General features of the Vibrio cholerae genome
Chromosome 1 Size (bp) Total number of sequences G+C percentage Total number of ORFs ORF size (bp) Percentage coding Number of rRNA operons (16S-23S-5S) Number of tRNA Number similar to known proteins Number similar to proteins of unknown function* Number of conserved hypothetical proteins Number of hypothetical proteins Number of Rho-independent terminators 2,961,151 36,797 47.7 2,770 952 88.6 8 94 1,614 (58%) 163 (6%) 478 (17%) 515 (19%) 599 Chromosome 2 1,072,914 14,367 46.9 1,115 918 86.3 0 4 465 (42%) 66 (6%) 165 (15%) 419 (38%) 193

............................................................................................................................................................................. bp, base pairs. ORFs, open reading frames. * Proteins of unknown function, signicant sequence similarity (homology) to a named protein for which there is currently no known function. Conserved hypothetical protein, sequence similarity to a translation of another ORF, but there is currently no experimental evidence a protein is expressed. Hypothetical protein, no signicant sequence similarity to another protein.

NATURE | VOL 406 | 3 AUGUST 2000 | www.nature.com

2000 Macmillan Magazines Ltd

477

articles
(chromosome 2) base pairs, with an average G+C content of 46.9% and 47.7%, respectively. There are a total of 3,885 predicted open reading frames (ORFs) and 792 predicted Rho-independent terminators; with 2,770 and 1,115 ORFs and 599 and 193 Rho-independent terminators on the individual chromosomes (Table 1, Figs 1 and 2). Most genes required for growth and viability are located on chromosome 1, although some genes found only on chromosome 2 are also thought to be essential for normal cell function (for example, dsdA, thrS and the genes encoding ribosomal proteins L20 and L35). Additionally, many intermediaries of metabolic pathways are encoded only on chromosome 2 (Fig. 3). The replicative origin in chromosome 1 was identied by similarity to the Vibrio harveyi and Escherichia coli origins, co-localization of genes (dnaA, dnaN, recF and gyrA) often found near the origin in prokaryotic genomes, and GC nucleotide skew (G-C/ G+C) analysis21. Based on these, we designated base-pair 1 in an intergenic region that is located in the putative origin of replication. Only the GC skew analysis was useful in identifying a putative origin on chromosome 2. This genomic sequence of V. cholerae conrmed the presence of a large integron island (a gene capture system) located on chromosome 2 (125.3 kbp)22,23. The V. cholerae integron island contains all copies of the V. cholerae repeat (VCR) sequence and 216 ORFs (Fig. 1). However, most of these ORFs have no homology to other sequences. Among the recognizable integron island genes are three that encode gene products that may be involved in drug resistance (chloramphenicol acetyltransferase, fosfomycin resistance protein and glutathione transferase), several DNA metabolism enzymes (MutT, transposase, and an integrase), potential virulence genes (haemagglutinin and lipoproteins) and three genes which encode gene products similar to the `host addiction' proteins (higA, higB and doc), which are used by plasmids to select for their maintenance by host cells. The two-chromosome structure of V. cholerae allows for comparisons, both between the two chromosomes of this organism and between either of the V. cholerae chromosomes and the chromosomes of other microbial species. There is pronounced asymmetry in the distribution of genes known to be essential for growth and virulence between the two chromosomes. Signicantly more genes encoding DNA replication and repair, transcription, translation, cell-wall biosynthesis and a variety of central catabolic and biosynthetic pathways are encoded by chromosome 1. Similarly, most genes known to be essential in bacterial pathogenicity (that is, those encoding the toxin co-regulated pilus, cholera toxin, lipopolysaccharide and the extracellular protein secretion machinery) are also located on chromosome 1. In contrast, chromosome 2 contains a larger fraction (59%) of hypothetical genes and genes of unknown function, compared with chromosome 1 (42%) (Fig. 4). This partitioning of hypothetical proteins on chromosome 2 is highly localized in the integron island (Fig. 2). Chromosome 2 also carries the 3-hydroxy-3-methylglutaryl CoA reductase, a gene apparently acquired from an archaea (Y. Boucher and W. F. Doolittle, personal communication). The majority of the V. cholerae genes were very similar to E. coli genes (1,454 ORFs), but 499 (12.8%) of the V. cholerae ORFs showed highest similarity to other V. cholerae genes, suggesting recent duplications (Figs 5 and 6). Most of the duplicated ORFs encode products involved in regulatory functions (59), chemotaxis (50), transport and binding (42), transposition (18), pathogenicity (13) or unknown functions encoded by conserved hypothetical (62) and hypothetical proteins (113). There are 105 duplications with at least one of each ORF on each chromosome indicating there have been recent crossovers between chromosomes. The extensive duplication of genes involved in scavenging behaviour (chemotaxis and solute transport) suggests the importance of these gene products in
478

V. cholerae biology, notably its ability to inhabit diverse environments. These environments, in turn, may have selected the duplication and divergence of genes useful for specialized functions. Additionally, whereas El Tor strain N16961 carries only a single copy of the cholera toxin prophage, other V. cholerae strains carry several copies of this element24,25, and strains of the classical biotype have a second copy of the prophage that is localized on chromosome 2 (ref. 20). Thus, virulence genes are presumed to be subject to selective pressure, affecting copy number and chromosomal location. Several ORFs with apparently identical functions exist on both chromosomes which were probably acquired by lateral gene transfer. For example, glyA (encoding serine hydroxymethyl transferase) is found once on each chromosome but the phylogenetic analysis suggest the glyA copy on chromosome 1 branches with the a-Proteobacteria, whereas the copy on chromosome 2 branches with the g-Proteobacteria (see Supplementary Information). The chromosome 2 glyA is anked by genes encoding transposases, suggesting that this gene was acquired through a transposition event.

Figure 1 Linear representation of the V. cholerae chromosomes. The location of Q the predicted coding regions, colour-coded by biological role, RNA genes, tRNAs, other RNAs, Rho-independent terminators and Vibrio cholerae repeats (VCRs) are indicated. Arrows represent the direction of transcription for each predicted coding region. Numbers next to the tRNAs represent the number of tRNAs at a locus. Numbers next to GES represent the number of membrane-spanning domains predicted by the Goldman, Engleman and Steitz scale calculated by TopPred for that protein. Gene names are available at the TIGR web site (www.tigr.org) and as Supplementary Information.
1 100,000 200,000 300,000 400,000 500,000 1

1,000,000 900,000 800,000 700,000 600,000 2,900,000 2,800,000 2,700,000 2,600,000 2,500,000 2,400,000

Comparative genomics

100,000 200,000 300,000 400,000 500,000 600,000

2,300,000 700,000 2,200,000 800,000 2,100,000 900,000 2,000,000 1,900,000 1,800,000 1,000,000 1,100,000

1,200,000 1,700,000 1,300,000 1,600,000 1,500,000 1,400,000

Figure 2 Circular representation of the V. cholerae genome. The two chromosomes, large and small, are depicted. From the outside inward: the rst and second circles show predicted protein-coding regions on the plus and minus strand, by role, according to the colour code in Fig. 1 (unknown and hypothetical proteins are in black). The third circle shows recently duplicated genes on the same chromosome (black) and on different chromosomes (green). The fourth circle shows transposon-related (black), phage-related (blue), VCRs (pink) and pathogenesis genes (red). The fth circle shows regions with signicant x2 values for trinucleotide composition in a 2,000-bp window. The sixth circle shows percentage G+C in relation to mean G+C for the chromosome.The seventh and eighth circles are tRNAs and rRNAs, respectively.
NATURE | VOL 406 | 3 AUGUST 2000 | www.nature.com

2000 Macmillan Magazines Ltd

articles
Origin and function of the small chromosome of V. cholerae
Several lines of evidence suggest that chromosome 2 was originally a megaplasmid captured by an ancestral Vibrio species. The phylogenetic analysis of the ParA homologues located near the putative origin of replication of each chromosome shows chromosome 1 ParA tending to group with other chromosomal ParAs, and the ParA from chromosome 2 tending to group with plasmid, phage and megaplasmid ParAs (see Supplementary Information). In general, genes on chromosome 2, with an apparently identical functioning copy on chromosome 1, appear less similar to orthologues present in other g-Proteobacteria species (see Supplementary Information). Also, chromosome 1 contains all the ribosomal RNA operons and at least one copy of all the transfer RNAs (four tRNAs are found on chromosome 2, but there are duplicates on chromosome 1). In addition, chromosome 2 carries the integron region, an element often found on plasmids26. Finally, the bias in the functional gene content is more easily explained, if chromosome 2
uracil NupC NMN, pnuC uraA Family xanthine/ (2,1) uracil family
sbp

was originally a megaplasmid (Fig. 4). The megaplasmid presumably acquired genes from diverse bacterial species before its capture by the ancestral Vibrio. The relocation of several essential genes from chromosome 1 to the megaplasmid completed the stable capture of this smaller replicon. Apparently this capture of the megaplasmid occurred long enough ago that the trinucleotide composition and percentage G+C content between the two chromosomes has become similar (except for laterally moving elements such as the integron island, bacteriophage genomes, transposons, and so on). The two chromosome structure is found in other Vibrio species19 suggesting that the gene content of the megaplasmid continues to provide Vibrio with an evolutionary advantage, perhaps within the aquatic ecosystem where Vibrio species are frequently the dominant microorganisms14,16. It is unclear why chromosome 2 has not been integrated into chromosome 1. Perhaps chromosome 2 plays an important specialized function that provides the evolutionary selective pressure to
cellobiose* fructose mannitol sucrose? glucose NAG trehalose?

ELECTRON TRANSPORT CHAIN

H+

H+

H+

ATP H+ synthase

PTS system Pi
IIBC IIA Pi
spermidine/ putrescine
potA/B/C/D

sulphate
cy sA/P/T /W

ATP

ADP + Pi

eCHITIN STARCH TREHALOSE N-ACETYLGLUCOSAMINE SUCROSE FRUCTO SE


GLYCOLYSIS GLUCONEOG ENESIS

phosphate (1,1)
pstA/B/C/S

GLYCOGEN

sugar-Pi
ENTNER DOUDO ROFF PATHWAY

FJI1 FJI0

peptides (3,1)

Pi
oligopeptides
oppA/B/C/D/F

molybdenum
modA/B/C

GLUCOSE

sulphate(2,2) Pho4 family protein phosphate, nptA ? sulphate, cysZ


fatty acids, fadL(2,1)

amino acids (3,1) arginine

LACTO SE

GLUCONATE
NON OXIDATIV E PENTOSE PHOSPH ATE PATHWAY

Histidine Degradation Pathway

GALACTOSE GLYCEROL ** MANNITOL

RIBOSE PRP P chorismate L- phenylalanine L- tryptophan tyrosine t

artI/M/P/Q

Na +/alanine (3) proton/glutamate, gltP (1,1) proton/peptide family Na +/proline,putP melanin Na +/glutamate,gltS
FLAGELLUM

HISTIDINE

urocanate imidazolepropanoate formiminoL-glutamate

MANNO SE

serine L-ALANINE

maltoporin
ompS

MALATE

PEP

SERINE L-TRYPTOP HAN D-alanine L-alanine leucine valine

gluconate ? L-lactate ?

D-LACTATE

Pyruvate

2-keto-isovalerate PP formate, H 2+CO 2

L-glutamate
ribose
rbsA/B/C/D

L-LACTATE ACETA TE L-lysine cadaverine acetyl-P

Acetyl-CoA

propionate

cadaverine/ lysine?

>40 flagellar and motor genes

maltose
mal E/F/G/ K

ASP ARAGINE

ethanol L- leucine

Fatty Acid Biosynthes is and Degradation

arginine/ ornithine putrescine/ , potE ornithine BCCT family AzlC family serine,sda C (2) tryptophan,mtr

galactoside
mglA/B/C

diaminopimelic acid

ASP ARTATE oxaloacetic citrate acid malate TCA Cycle

CheA/B/D?/R/ V/W/Y/Z

METHIONINE L-ASPA RTATE fumarate

oxalate/formate ? Na +/citrate, citS Na +/dicarboxylate(1,1) formate ? (1,1) benzoate, benE C4- dicarboxylate
dctP/Q/M, dcuA/B/C

L-iso-leucine THREONINE propionate O-succinylSERINE homoserine glycine CO 2 succinate fumarate

cis-aconitate glyoxylate Glyoxylate bypass isocitrate 2-oxoglutarate succinyl-CoA GLUTA MINE ORNITHINE ARGININE

MCPs (23,20)

tyrosine, tyrP bcaa, brnQ potassium, trkA/H NH 4+ ? Mg2+, mgt,(2,1) iron (II), feoA/B Na +/H +(4,2) potassium, kup potassium ke fB (2?)

L- cysteine

PUTRESCINE L-GLUTAMATE PROLINE

sugar family

ExbB(1,1)

ExbD(1,1)

porin ?
mscL

MCP

TonB (1,1)

lipolysaccharide/ O-antigen, rfbH/I

glyc G3P colicins thiamine? B12? btuB/C/D toxins drugs heme glpF glpT tolA/B/Q/R ugpA/B/C/E (4,2) (14,3) ccmA/B/C/D

NadC family iron (III) G3P MsbA? cations AcrB/D/F (2,2) E1-E2 family (2,1) family (3)

vibriobactin receptor hemin


hutB/C/D

zinc
znuA/B/C

vibriobactin
fepB/C/D/G

viuA

K ? hutA

heme
irgA

TonB system receptor(1,2)

chemotactic signals

Figure 3 Overview of metabolism and transport in V. cholerae. Pathways for energy production and the metabolism of organic compounds, acids and aldehydes are shown. Transporters are grouped by substrate specicity: cations (green), anions (red), carbohydrates (yellow), nucleosides, purines and pyrimidines (purple), amino acids/ peptides/amines (dark blue) and other (light blue). Question marks associated with transporters indicate a putative gene, uncertainty in substrate specicity, or direction of transport. Permeases are represented as ovals; ABC transporters are shown as composite gures of ovals, diamonds and circles; porins are represented as three ovals; the largeconductance mechanosensitive channel is shown as a gated cylinder; other cylinders represent outer membrane transporters or receptors; and all other transporters are drawn as rectangles. Export or import of solutes is designated by the direction of the arrow through the transporter. If a precise substrate could not be determined for a transporter, no gene name was assigned and a more general common name reecting the type of substrate being transported was used. Gene location on the two chromosomes, for both
NATURE | VOL 406 | 3 AUGUST 2000 | www.nature.com

transporters and metabolic steps, is indicated by arrow colour: all genes located on the large chromosome (black); all genes located on the small chromosome (blue); all genes needed for the complete pathway on one chromosome, but a duplicate copy of one or more genes on the other chromosome (purple); required genes on both chromosomes (red); complete pathway on both chromosomes (green). (Complete pathways, except for glycerol, are found on the large chromosome.) Gene numbers on the two chromosomes are in parentheses and follow the colour scheme for gene location. Substrates underlined and capitalized can be used as energy sources. PRPP, phosphoribosyl-pyrophosphate; PEP, phosphoenolpyruvate; PTS, phosphoenolpyruvate-dependant phosphotransferase system; ATP, adenosine triphosphate; ADP, adenosine diphosphate; MCP, methylaccepting chaemotaxis protein; NAG, N-acetylglucosamine; G3P, glycerol-3-phosphate; glyc, glycerol; NMN, nicotinamide mononucleotide. Asterisk, because V. cholerae does not use cellobiose, we expect this PTS system to be involved in chitobiose transport.

2000 Macmillan Magazines Ltd

479

articles
suppress integration events when they do occur. For example, if under some environmental condition there is a difference in copy number between the chromosomes, then chromosome 2 may have accumulated genes that are better expressed at higher or lower copy number than genes on chromosome 1. A second possibility is that, in response to environmental cues, one chromosome may partition to daughter cells in the absence of the other chromosome (aberrant segregation). Such single-chromosome-containing cells would be replication-defective but still maintain metabolic activity (`drone' cells), and, therefore, be a potential source of ``viable, but nonculturable (VBNC)'' cells observed to occur in V. cholerae17. Such `drone' cells may also play a role in V. cholerae biolms7,27,28 by, for example, producing extracellular chitinase, protease and other degradative enzymes that enhance survival of cells in a biolm, retaining two chromosomes without directly competing with these cells for nutrients. for cellobiose transport, but as V. cholerae does not use cellobiose, it is more likely that this PTS is involved in transport of the structurally similar compound, chitobiose, analogous to the situation proposed for Bourrelia burgdorferi18. The three anions that are transported by ABC transport systems in V. cholerae are molybdenum, phosphate and sulphate. Molybdenum transport genes (modA/B/C) are all located on chromosome 2, and most of the sulphate transport genes are on the large molecule. However, copies of the genes for phosphate transporters are found in both chromosomes. The genes in these two phosphate transport operons are different from each other and do not represent a recent duplication; instead, this suggests that one may be an acquired operon. Several of the regulatory pathways, both for regulation in response to environmental and pathogenic signals, are divided between the two chromosomes. These included pathways for starvation survival, `quorum sensing' and expression of the entertoxigenic haemolysin, HlyA. During periods of nutrient starvation, V. cholerae, and other Gram-negative bacteria, enter the stationary phase and, later, the viable but nonculturable (VBNC) state14,17. The alternative sigma factor j38 (rpoS) is required for survival of V. cholerae in the environment but not for pathogenicity31, and therefore probably plays an important role in the initiation of the VBNC state. There is one copy of rpoS, located on chromosome 1, near the oriC. The RpoS regulates expression of several other proteins, including catalase, cyclopropane-fatty-acyl-phospholipid synthase and HA/protease, which are found on both chromosomes31. Genes involved in `quorum sensing', or cell-density-dependent regulation, also exist on both chromosomes of V. cholerae. In bioluminescent Vibrio species (notably Vibrio scheri and V. harveyi), quorum sensing is used to control light production. Although this strain of V. cholerae lacks the genes for bioluminescence, it does have the genes required for the autoinducer-2 (AI-2) quorumsensing mechanism32 (luxOPQSU) but this pathway is split between the chromosomes with luxOSU on chromosome 1 and luxPQ on chromosome 2. Similarly, another transcriptional regulatory gene,

Interchromosomal regulation

Transport and energy metabolism

Vibrio cholerae has a diverse natural habitat that includes association with zooplankton in a sessile stage, a planktonic state in the water column, and the capacity to act as a pathogen within the human gastrointestinal tract. It is, therefore, no surprise that this organism maintains a large repertoire of transport proteins with broad substrate specicity and the corresponding catabolic pathways to enable it to respond efciently to these different and constantly changing ecosystems (Fig. 4). Many of the sugar transporter systems and their corresponding catabolic pathway enzymes are localized on a single chromosome (that is, ribose and lactate transport and degradation enzymes are contained on chromosome 2, whereas the trehalose systems reside on chromosome 1). However, many of the other energy metabolism pathways are split (that is, chitin, glycolysis, and so on) between the chromosomes (Fig. 3). In aquatic environments, chitin often represents a source of both carbon and nitrogen. This energy source is important for V. cholerae as it is associated with zooplankton, which have a chitinous exoskeleton13,15,29. Vibrio cholerae degrades chitin by a pathway that is very similar to that of Vibrio furnissii30. Sequence analysis suggests a phosphenolpyruvate phosphotransferase system (PTS)

9 Open reading frames (%) 8 7 6 5 4 3 2 1 0


Am in ac nuc o id le a an osi P cid d de ur bi p s pr ho , aine osy C osth B sp nd s, p nth en e io h n y e tra tic sy olip uc rim sis l i g nth id leo idi * nt ro e m t n er up sis e ide es m s o ta s , ed , a f bo Tr an i n c l sp E ary d cofa ism or ne m ar cto t b rg eta rie rs in y m bo rs* , di ng et lism a DN and bol * pr ism A ot m ei e T tab ns Pr ran oli ot sc sm ei n ript * sy io Re n n gu Pr the * la ote sis to in * ry fu fat C Ce nc e* el ll lu en tion la s v O r pr elo th o p er ce e* ca ss H teg es* yp ot orie he s tic al *1

Figure 4 Percentage of total Vibrio cholerae open reading frames (ORFs) in biological roles compared with other g-Proteobacteria. These were V. cholerae, chromosome 1 (blue); V. cholerae, chromosome 2 (red); Escherichia coli (yellow); Haemophilus inuenzae (pale blue). Signicant partitioning (P , 0.01) of biological roles between V. cholerae chromosomes is indicated with an asterisk, as determined with a x2 analysis. 1, Hypothetical contains both conserved hypothetical proteins and hypothetical proteins, and is at 1/10 scale compared with other roles.
480

Fa tty

Figure 5 Comparison of the V. cholerae ORFs with those of other completely sequenced genomes. The sequence of all proteins from each completed genome were retrieved from NCBI, TIGR and the Caenorhabditis elegans (wormpep16) databases. All V. cholerae ORFs (large chromosome, blue; small chromosome, red) were searched against all other genomes with FASTA3. The number of V. cholerae ORFs with greatest similarity (E # 10-5) are shown in proportion to the total number of ORFs in that genome. There were no ORFs that were most similar to a Mycoplasma pneumoniae ORF.
NATURE | VOL 406 | 3 AUGUST 2000 | www.nature.com

2000 Macmillan Magazines Ltd

in Relative proportion of most similar oc A o qu open reading frames M M ccu ife x yc y s ob co B rad aeo ac pla ac io lic te sm illu du us Borium a g s s ran e ub s r C Tre reli tub nit tili hl p a e al s am on b rc iu C y e ur u m hl d m g lo am ia a do si y p p rf s H ae dia neuallid eri m Es tra m um op c c on hi he ho iae N ei Vlus rich mat ss ib in ia is Ri er ri flu c ck ia o e ol c n i e H tt me ho za C eli sia nin ler e am co p g a py ba row itid e M et Th Sy lob cte aze is ha n r e no rm ec act py kii ba M A o ho er lo ct e rch Ae tog cy jej ri er th ae ro a st un iu an o py m is i m o g ru a sp th co lob m riti . m e c Py rm cu us pe a ro oa s ja fulg rni co ut n id x Sa Ca Py cc otr nas us cc eno roc us op ch ha rh oc ho hic ii ro ab cu rik um m d s os yc itis a h es e by ii ce leg ssi re an vis s ia e

10

0.35 0.3

0.25

0.2

0.15

0.1

0.05

De

articles
V.cholerae VC0512 V.cholerae VCA1034 V.cholerae VCA0974 V.cholerae VCA0068 ** V.cholerae VC0825 * V.cholerae VC0282 V.cholerae VCA0906 V.cholerae VCA0979 V.cholerae VCA1056 V.cholerae VC1643 V.cholerae VC2161 V.cholerae VCA0923 ** V.cholerae VC0514 ** V.cholerae VC1868 V.cholerae VCA0773 V.cholerae VC1313 V.cholerae VC1859 V.cholerae VC1413 V.cholerae VCA0268 V.cholerae VCA0658 ** V.cholerae VC1405 V.cholerae VC1298 * V.cholerae VC1248 V.cholerae VCA0864 V.cholerae VCA0176 V.cholerae VCA0220 ** V.cholerae VC1289 V.cholerae A1069 ** V.cholerae VC2439 V.cholerae 1967 V.cholerae A0031 V.cholerae VC1898 V.cholerae VCA0663 V.cholerae VCA0988 V.cholerae VC0216 V.cholerae VC0449 * V.cholerae VCA0008 V.cholerae VC1406 V.cholerae VC1535 V.cholerae VC0840 B.subtilis gi2633766 Synechocystis sp. gi1001299 Synechocystis sp. gi1001300 * Synechocystis sp. gi1652276 * Synechocystis sp. gi1652103 * H.pylori gi2313716 ** H.pylori 99 gi4155097 ** C.jejuni Cj1190c C.jejuni Cj1110c A.fulgidus gi2649560 ** A.fulgidus gi2649548 B.subtilis gi2634254 B.subtilis gi2632630 B.subtilis gi2635607 B.subtilis gi2635608 B.subtilis gi2635609 ** ** ** B.subtilis gi2635610 B.subtilis gi2635882 E.coli gi1788195 E.coli gi2367378 ** E.coli gi1788194 * E.coli gi1787690 V.cholerae VCA1092 V.cholerae VC0098 E.coli gi1789453 H.pylori gi2313186 99 gi4154603 ** H.pylori C.jejuni Cj0144 C.jejuni Cj1564 ** C.jejuni Cj0262c ** C.jejuni Cj1506c H.pylori gi2313163 * ** H.pylori 99 gi4154575 H.pylori gi2313179 ** ** H.pylori 99 gi4154599 C.jejuni Cj0019c C.jejuni Cj0951c C.jejuni Cj0246c B.subtilis gi2633374 T.maritima TM0014 V.cholerae VC1403 V.cholerae VCA1088 T.pallidum gi3322777 T.pallidum gi3322939 T.pallidum gi3322938 ** ** B.burgdorferi gi2688522 T.pallidum gi3322296 B.burgdorferi gi2688521 * ** T.maritima TM0429 T.maritima TM0918 ** T.maritima TM0023 T.maritima TM1428 * T.maritima TM1143 T.maritima TM1146 P.abyssi PAB1308 P.horikoshii gi3256846 ** P.abyssi PAB1336 ** ** P.horikoshii gi3256896 ** P.abyssi PAB2066 P.horikoshii gi3258290 ** * P.abyssi PAB1026 P.horikoshii gi3256884 ** D.radiodurans DRA00354 D.radiodurans DRA0353 ** ** D.radiodurans DRA0352 V.cholerae VC1394 P.abyssi PAB1189 P.horikoshii gi3258414 ** B.burgdorferigi 2688621 M.tuberculosis gi1666149 V.cholerae VC0622

hlyU (ref. 33), is located on chromosome 1, while the gene it regulates, hlyA, is located on chromosome 2.

DNA repair

Vibrio cholerae has genes encoding several DNA-repair and DNAdamage-response pathways, including nucleotide-excision repair, mismatch-excision repair, base-excision repair, AP endonuclease, alkylation transfer, photoreactivation, DNA ligation, and all the major components of recombination and recombinational repair, including initiation, recombination and resolution34. In addition, homologues of many of the genes involved in the SOS response in E. coli are found. The presence of three photolyase homologues, more than have been found in other bacterial species, probably allows for the ability to photoreactivate the two major forms of ultraviolet-induced DNA damage (cyclobutane pyrimidine dimers and 6-4 photoproducts), and may also allow use of a range of wavelengths of light used for the energy required for photoreactivation. It is also of interest that many of the repair genes are on chromosome 2 (alkA, ada1, ada2, phr3, mutK, sbcCD, dcm, mutT3), indicating that this chromosome is probably required for full DNA repair capability.

Pathogenicity

**

**

Figure 6 Phylogenetic tree of methyl-accepting chemotactic proteins (MCP) homologues in completed genomes. Homologues of MCP were identied by FASTA3 searches of all available complete genomes. Amino-acid sequences of the proteins were aligned using CLUSTALW, and a neighbour-joining phylogenetic tree was generated from the alignment using the PAUP* program (using a PAM-based distance calculation). Hypervariable regions of the alignment and positions with gaps in many of the sequences were excluded from the analysis. Nodes with signicant bootstrap values are indicated: two asterisks, .70%; asterisk, 4070%.
NATURE | VOL 406 | 3 AUGUST 2000 | www.nature.com

Toxins. The genome sequence of V. cholerae El Tor N16961 revealed a single copy of the cholera toxin (CT) genes, ctxAB, located on chromosome 1 within the integrated genome of CTXf, a temperate lamentous phage3. The receptor for entry of CTXf into the cell is the toxin-coregulated pilus (TCP)3, and the TCP gene cluster (see below) also resides on chromosome 1. Like the structural genes for CT and TCP, the regulatory gene, toxR, which controls their expression in vivo35, is also located on chromosome 1. On the other side of CTXf prophage is a region encoding an RTX toxin (rtxA), and its activator (rtxC) and transporters (rtxBD)36. A third transporter gene has been identied that is a paralogue of rtxB, and is transcribed in the same direction as rtxBD. Downstream of this gene are two genes encoding a sensor histidine kinase and response regulator. Trinucleotide composition analysis suggests that the RTX region was horizontally acquired along with the sensor histidine kinase/response regulator, suggesting these regulators effect expression of the closely linked RTX transcriptional units. Also present are genes encoding numerous potential toxins, including several haemolysins, proteases and lipases. These include hap, the haemagluttinin protease, a secreted metalloprotease that seems to attack proteins involved in maintaining the integrity of epithelial cell tight junctions37, and hlyA, encoding a secreted haemolysin that displays enterotoxic activity38. In contrast to CTX, RTX and all known intestinal colonization factors, the hap and hlyA genes virulence factors reside on chromosome 2. Vibrio cholerae has been reported to produce shiga-like toxins39; however, the sequence did not reveal genes encoding specic homologues of the A or B subunits of shiga toxin. Also not detected were genes encoding homologues of E. coli heat stable toxin (ST), which have been detected in other pathogenic strains of V. cholerae40. Colonization factors. The critical intestinal colonization factor of V. cholerae is the TCP, a type IV pilus5,41. The genome sequence conrmed that the genes involved in TCP assembly (tcpABCDEFGHIJNQRST) reside on chromosome 1 (ref. 20) as part of a proposed `pathogenicity island' (also referred to as VPI) composed of recently acquired DNA that encodes not only TCP, but also other genes associated with the ToxR regulatory cascade, such as acfABCD, toxT, aldA and tagAB10,11. Trinucleotide composition analysis suggest that this 45.3-kb segment begins at a 20-bp site upstream of aldA, and encompasses a helicase-related protein and a transcriptional activator which both share homology with bacteriophage proteins. At the other end of the segment of atypical trinucleotide composition is a phage family integrase and the
481

2000 Macmillan Magazines Ltd

articles
other copy of the 20-bp site, which is presumably the target for integration of the island onto the chromosome10,11. It has been proposed that the TCP/ACF island corresponds to the genome of a lamentous phage that uses TCP pilin as a coat protein4. However, other than the three genes encoding phage-related proteins (that is, the helicase, transcriptional activator and integrase) we could nd no other genes on the island that encoded products with signicant homology to the conserved gene products of other lamentous phages or the structural proteins of nonlamentous phages. The maltose-sensitive haemagglutinin (MSHA) is unique to the El Tor biotype of V. cholerae. Initially characterized as a haemagglutinin, it was later found to be a type IV pilus42,43. The MSHA biogenesis (MshHIJKLMNEGF) and structural (MshBACD) proteins are all clustered on chromosome 1. There are no apparent integrases or transposases that might dene this region as a pathogenicity island or suggest an origin for it other than V. cholerae. In support of this conclusion, trinucleotide composition analysis shows that this region has similar composition to the rest of the chromosome, suggesting that if these genes were acquired it was very early in the Vibrio phylogenetic history. Recently, several investigators have reported that MSHA is not required for intestinal colonization, nor does it seem to appreciably affect the efciency of colonization4446, but instead plays a role in biolm formation27,28. Accordingly, this pilus may be important for the environmental tness of Vibrio species rather than for pathogenic potential. The pilA region of V. cholerae genome apparently encodes a third type IV pilus, although it has not been visualized47. This gene cluster includes a gene encoding a prepilin peptidase (PilD) that is required for the efcient processing of protein complexes with type IV prepilin-like signal sequences including TCP, MSHA and EPS47,48. The EPS system of V. cholerae encodes a type II secretion system involved in extracellular export of CT and other proteins. The EPS system is encoded by chromosome 1 but, like the MSHA genes, trinucleotide analysis suggests that the EPS genes of V. cholerae have not been recently acquired. In contrast, trinucleotide composition analysis suggests that the pilA gene cluster was acquired by horizontal transfer. Thus, analysis of the V. cholerae genome sequence provides some evidence that older gene clusters, like MSHA and EPS, have become dependent on newly acquired genes such as PilD.
by editing the end sequences and/or primer walking on plasmid clones. Physical gaps were closed by direct sequencing of genomic DNA, or combinatorial polymerase chain reaction (PCR) followed by sequencing the PCR product. The nal genome sequence is based on 51,164 sequences. ORF prediction and gene family identication. An initial set of ORFs, likely to encode proteins, was identied with GLIMMER49, and those shorter than 30 codons were eliminated. ORFs that overlapped were visually inspected, and in some cases removed. ORFs were searched against a non-redundant protein database18. Frameshifts and point mutations were detected and corrected where appropriate. Remaining frameshifts and point mutations are considered to be authentic and were annotated as `authentic frameshift' or `authentic point mutation'. ORFs were also analysed with two sets of hidden Markov models (HMMs) constructed for a number of conserved protein families (1,313 from Pfam v3.1 (ref. 50) and 476 from the TIGRFAM) by use of the HMMER package. TopPred was used to identify membrane-spanning domains in proteins. Paralogous gene families were constructed by searching the ORFs against themselves using BLASTX, identifying matches with E # 10-5 over 60% of the query search length, and subsequently clustering these matches into multigene families. Multiple alignments for these protein families were generated with the CLUSTALW program and the alignments scrutinized. Distribution of all 64 trinucleotides (3-mers) for each chromosome was determined, and the 3-mer distribution in 2,000-bp windows that overlapped by half their length (1,000 bp) across the genome was computed. For each window, we computed the x2 statistic on the difference between its 3-mer content and that of the whole chromosome. A large value of this statistic indicates that the 3-mer composition in this window is different from the rest of the chromosome. Probability values for this analysis are based on the assumption that the DNA composition is relatively uniform throughout the genome. Because this assumption may be incorrect, we prefer to interpret high x2 values merely as indicators of regions on the chromosome that appear unusual and demand further scrutiny. Homologues of the genes of interest were identied using the BLASTP and FASTA3 search programs. All homologues were then aligned to each other using the CLUSTALW program with default settings. Phylogenetic trees were generated from the alignments using the neighbour-joining algorithm as implemented by the PAUP* program (with a PAM matrix based distance calculation). Regions of the alignment that were hypervariable or were of low condence were excluded from the phylogenetic analysis. All alignments are available upon request.
Received 3 April; accepted 18 May 2000. 1. Wachsmuth, K., Olsvik, ., Evins, G. M. & Popovic, T. in Vibrio Cholerae And Cholera: Molecular To Global Perspective (eds Wachsmuth, I. K., Blake, P. A. & Olsvik, .) 357370 (ASM Press, Washington DC, 1994). 2. Faruque, S. M., Albert, M. J. & Mekalanos, J. J. Epidemiology, genetics, and ecology of toxigenic Vibrio cholerae. Microbiol. Mol. Biol. Rev. 62, 13011314 (1998). 3. Waldor, M. K. & Mekalanos, J. J. Lysogenic conversion by a lamentous phage encoding cholera toxin. Science 272, 19101914 (1996). 4. Karaolis, D. K., Somara, S., Maneval, D. R. Jr, Johnson, J. A. & Kaper, J. B. A bacteriophage encoding a pathogenicity island, a type-IV pilus and a phage receptor in cholera bacteria. Nature 399, 375379 (1999). 5. Brown, R. C. & Taylor, R. K. Organization of tcp, acf, and toxT genes within a ToxT-dependent operon. Mol. Microbiol. 16, 425439 (1995). 6. Hochhut, B. & Waldor, M. K. Site-specic integration of the conjugal Vibrio cholerae SXTelement into prfC. Mol. Microbiol. 32, 99110 (1999). 7. Yildiz, F. H. & Schoolnik, G. K. Vibrio cholerae O1 El Tor: identication of a gene cluster required for the rugose colony type, exopolysaccharide production, chlorine resistance, and biolm formation. Proc. Natl Acad. Sci. USA 96, 40284033 (1999). 8. Bik, E. M., Bunschoten, A. E., Gouw, R. D. & Mooi, F. R. Genesis of the novel epidemic Vibrio cholerae O139 strain: evidence for horizontal transfer of genes involved in polysaccharide synthesis. EMBO J 14, 209216 (1995). 9. Waldor, M. K., Colwell, R. & Mekalanos, J. J. The Vibrio cholerae O139 serogroup antigen includes an O-antigen capsule and lipopolysaccharide virulence determinants. Proc. Natl Acad. Sci. USA 91, 1138811392 (1994). 10. Kovach, M. E., Shaffer, M. D. & Peterson, K. M. A putative integrase gene denes the distal end of a large cluster of ToxR-regulated colonization genes in Vibrio cholerae. Microbiology 142, 21652174 (1996). 11. Karaolis, D. K. et al. A Vibrio cholerae pathogenicity island associated with epidemic and pandemic strains. Proc. Natl Acad. Sci. USA 95, 31343139 (1998). 12. Mekalanos, J. J., Rubin, E. J. & Waldor, M. K. Cholera: molecular basis for emergence and pathogenesis. FEMS Immunol. Med. Microbiol. 18, 241248 (1997). 13. Colwell, R. R. Global climate and infectious disease: the cholera paradigm. Science 274, 20252031 (1996). 14. Colwell, R. R. & Spira, W. M. in Cholera (eds Barua, D. & Greenough, W. B. III) 107127 (Plenum Medical, New York, 1992). 15. Lobitz, B. et al. Climate and infectious disease: use of remote sensing for detection of Vibrio cholerae by indirect measurement. Proc. Natl Acad. Sci. USA 97, 14381443 (2000). 16. Colwell, R. R. & Huq, A. Environmental reservoir of Vibrio cholerae. The causative agent of cholera. Ann. NY Acad. Sci. 740, 4454 (1994). 17. Roszak, D. B. & Colwell, R. R. Survival strategies of bacteria in the natural environment. Microbiol. Rev. 51, 365379 (1987). 18. Fraser, C. M. et al. Genomic sequence of a Lyme disease spirochaete, Borrelia burgdorferi. Nature 390, 580586 (1997). 19. Yamaichi, Y., Iida, T., Park, K. S., Yamamoto, K. & Honda, T. Physical and genetic map of the genome of Vibrio parahaemolyticus: presence of two chromosomes in Vibrio species. Mol. Microbiol. 31, 1513 1521 (1999). 20. Trucksis, M., Michalski, J., Deng, Y. K. & Kaper, J. B. The Vibrio cholerae genome contains two unique circular chromosomes. Proc. Natl Acad. Sci. USA 95, 1446414469 (1998).

Conclusions

The Vibrio cholerae genome sequence provides a new starting point for the study of this organism's environmental and pathobiological characteristics. It will be interesting to determine the gene expression patterns that are unique to its survival and replication during human infection35 as well as in the environment13,14,16. Additionally, the genomic sequence of V. cholerae should facilitate the study of this model multi-chromosomal prokaryotic organism. Comparative genomics between several species in the genus Vibrio will provide a better understanding of the origin of the new small chromosome and the role that it plays in Vibrio biology. The genome sequence may also provide important clues to understanding the metabolic and regulatory networks that link genes on the two chromosomes. Finally, V. cholerae clearly represents a promising genetic system for studying how several horizontally acquired loci located on separate chromosomes can still efciently interact at the regulatory, cell biology and biochemical levels. M

Methods
Whole-genome random sequencing procedure.
Vibrio cholerae N16961 was grown from a single isolated colony. Cloning, sequencing and assembly were as described for genomes sequenced by TIGR18. One small-insert plasmid library (23 kb) was generated by random mechanical shearing of genomic DNA. One large insert library was ligated into l-DASHII/EcoRI vector (Stratagene). In the initial sequence phase, approximately sevenfold sequence coverage was achieved with 49,633 sequences from plasmid clones. Sequences from both ends of 383 l-clones served as a genome scaffold, verifying the orientation, order and integrity of the contigs. The plasmid and l sequences were jointly assembled using TIGR Assembler. Sequence gaps were closed

482

2000 Macmillan Magazines Ltd

NATURE | VOL 406 | 3 AUGUST 2000 | www.nature.com

articles
21. Lobry, J. R. Asymmetric substitution patterns in the two DNA strands of bacteria. Mol. Biol. Evol. 13, 660665 (1996). 22. Rowe-Magnus, D. A., Guerout, A. M. & Mazel, D. Super-integrons. Res. Microbiol. 150, 641651 (1999). 23. Hall, R. M., Brookes, D. E. & Stokes, H. W. Site-specic insertion of genes into integrons: role of the 59-base element and determination of the recombination cross-over point. Mol. Microbiol. 5, 1941 1959 (1991). 24. Davis, B. M., Kimsey, H. H., Chang, W. & Waldor, M. K. The Vibrio cholerae O139 Calcutta bacteriophage CTXphi is infectious and encodes a novel repressor. J. Bacteriol. 181, 67796787 (1999). 25. Mekalanos, J. J. Duplication and amplication of toxin genes in Vibrio cholerae. Cell 35, 253263 (1983). 26. Mazel, D., Dychinco, B., Webb, V. A. & Davies, J. A distinctive class of integron in the Vibrio cholerae genome. Science 280, 605608 (1998). 27. Watnick, P. I. & Kolter, R. Steps in the development of a Vibrio cholerae El Tor biolm. Mol. Microbiol. 34, 586595 (1999). 28. Watnick, P. I., Fullner, K. J. & Kolter, R. A role for the mannose-sensitive hemagglutinin in biolm formation by Vibrio cholerae El Tor. J. Bacteriol. 181, 36063609 (1999). 29. Huq, A. et al. Ecological relationships between Vibrio cholerae and planktonic crustacean copepods. Appl. Environ. Microbiol. 45, 275283 (1983). 30. Bassler, B. L., Yu, C., Lee, Y. C. & Roseman, S. Chitin utilization by marine bacteria. Degradation and catabolism of chitin oligosaccharides by Vibrio furnissii. J. Biol. Chem. 266, 2427624286 (1991). 31. Yildiz, F. H. & Schoolnik, G. K. Role of rpoS in stress survival and virulence of Vibrio cholerae. J. Bacteriol. 180, 773784 (1998). 32. Bassler, B. L., Greenberg, E. P. & Stevens, A. M. Cross-species induction of luminescence in the quorum-sensing bacterium Vibrio harveyi. J. Bacteriol. 179, 40434045 (1997). 33. Williams, S. G., Attridge, S. R. & Manning, P. A. The transcriptional activator HlyU of Vibrio cholerae: nucleotide sequence and role in virulence gene expression. Mol. Microbiol. 9, 751760 (1993). 34. Eisen, J. A. & Hanawalt, P. C. A phylogenomic study of DNA repair genes, proteins, and processes. Mutat. Res. 435, 171213 (1999). 35. Lee, S. H., Hava, D. L., Waldor, M. K. & Camilli, A. Regulation and temporal expression patterns of Vibrio cholerae virulence genes during infection. Cell 99, 625634 (1999). 36. Lin, W. et al. Identication of a Vibrio cholerae RTX toxin gene cluster that is tightly linked to the cholera toxin prophage. Proc. Natl Acad. Sci. USA 96, 10711076 (1999). 37. Wu, Z., Milton, D., Nybom, P., Sjo, A. & Magnusson, K. E. Vibrio cholerae hemagglutinin/protease (HA/protease) causes morphological changes in cultured epithelial cells and perturbs their paracellular barrier function. Microb. Pathog. 21, 111123 (1996). 38. Alm, R. A., Stroeher, U. H. & Manning, P. A. Extracellular proteins of Vibrio cholerae: nucleotide sequence of the structural gene (hlyA) for the haemolysin of the haemolytic El Tor strain 017 and characterization of the hlyA mutation in the non- haemolytic classical strain 569B. Mol. Microbiol. 2, 481488 (1988). 39. O'Brien, A. D., Chen, M. E., Holmes, R. K., Kaper, J. & Levine, M. M. Environmental and human 40. isolates of Vibrio cholerae and Vibrio parahaemolyticus produce a Shigella dysenteriae 1 (Shiga)-like cytotoxin. Lancet 1, 7778 (1984). Ogawa, A., Kato, J., Watanabe, H., Nair, B. G. & Takeda, T. Cloning and nucleotide sequence of a heatstable enterotoxin gene from Vibrio cholerae non-O1 isolated from a patient with traveler's diarrhea. Infect. Immun. 58, 33253329 (1990). Manning, P. A. The tcp gene cluster of Vibrio cholerae. Gene 192, 6370 (1997). Jonson, G., Holmgren, J. & Svennerholm, A. M. Identication of a mannose-binding pilus on Vibrio cholerae El Tor. Microb. Pathog. 11, 433441 (1991). Jonson, G., Lebens, M. & Holmgren, J. Cloning and sequencing of Vibrio cholerae mannose-sensitive haemagglutinin pilin gene: localization of mshA within a cluster of type 4 pilin genes. Mol. Microbiol. 13, 109118 (1994). Attridge, S. R., Manning, P. A., Holmgren, J. & Jonson, G. Relative signicance of mannose-sensitive hemagglutinin and toxin-coregulated pili in colonization of infant mice by Vibrio cholerae El Tor. Infect. Immun. 64, 33693373 (1996). Tacket, C. O. et al. Investigation of the roles of toxin-coregulated pili and mannose- sensitive hemagglutinin pili in the pathogenesis of Vibrio cholerae O139 infection. Infect. Immun. 66, 692695 (1998). Thelin, K. H. & Taylor, R. K. Toxin-coregulated pilus, but not mannose-sensitive hemagglutinin, is required for colonization by Vibrio cholerae O1 El Tor biotype and O139 strains. Infect. Immun. 64, 28532856 (1996). Fullner, K. J. & Mekalanos, J. J. Genetic characterization of a new type IV-A pilus gene cluster found in both classical and El Tor biotypes of Vibrio cholerae. Infect. Immun. 67, 13931404 (1999). Marsh, J. W. & Taylor, R. K. Identication of the Vibrio cholerae type 4 prepilin peptidase required for cholera toxin secretion and pilus formation. Mol. Microbiol. 29, 14811492 (1998). Salzberg, S. L., Delcher, A. L., Kasif, S. & White, O. Microbial gene identication using interpolated Markov models. Nucleic Acids Res. 26, 544548 (1998). Bateman, A. et al. Pfam 3.1: 1313 multiple alignments and prole HMMs match the majority of proteins. Nucleic Acids Res. 27, 260262 (1999).

41. 42. 43.

44.

45.

46.

47. 48. 49. 50.

Supplementary information is available on Nature's World-Wide Web site (http:// www.nature.com) or as paper copy from the London editorial ofce of Nature.

Acknowledgements
This work was supported by the National Institutes of Health, National Institute of Allergy and Infectious Disease. We thank M. Heaney, V. Sapiro, B. Lee, M. Holmes and B. Vincent for database and software support. Correspondence and requests for materials should be addressed to C.M.F. (e-mail: gvc@tigr.org). The annotated genome sequence and the gene family alignments are available at hhttp://www.tigr.org/tbd/mdbi. The sequences have been deposited in GenBank with accession number AE003852 (chromosome 1) and AE003853 (chromosome 2).

NATURE | VOL 406 | 3 AUGUST 2000 | www.nature.com

2000 Macmillan Magazines Ltd

483

484
6 7 11 16Sa 1 12 9 10 2 23Sa 6 11 6 26 5 8 1 14 6 6 23Sb 1 5 5Sb 27 28 29 30 31 32 33 34 35 36 37 39 40 41 42 43 44 45 46 47 48 49 50 51 52 54 55 56 16Sb 57 58 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 97 98 6 99 100 128 9 5 129 130 131 132 134 135 136 137 139 140 142 144 146 147 148 149 150 151 152 153 154 156 157 158 159 161 162 164 165 166 167 168 170 171 172 173 174 175 177 178 179 180 181 182 183 13 184 185 186 187 188 190 13 191 192 194 215 16Sc 2 9 13 4 8 216 217 220 221 23Sc 222 223 5Sc 1 224 225 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 249 250 251 252 254 256 257 258 259 260 262 263 264 265 266 267 268 269 270 271 274 275 276 277 278 280 281 282 283 284 285 286 287 288 289 312 5 5 19 12 5 5 315 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 348 349 350 353 354 355 356 357 359 360 361 4 362 365 366 369 370 371 372 373 374 376 377 378 379 381 384 385 397 1 7 5 5 398 399 400 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 428 429 430 431 432 433 434 437 438 439 440 441 442 443 444 445 446 447 449 8 450 452 453 9 454 455 456 457 459 460 461 462 463 464 8 465 466 467 468 469 470 471 490 8 11 7 6 9 11 491 492 493 494 495 496 498 499 500 501 502 503 505 506 508 509 510 512 513 514 515 516 517 518 519 521 522 523 524 525 526 528 529 530 531 532 533 534 535 537 538 539 540 541 542 543 544 545 547 549 2 550 551 552 553 554 556 557 558 559 9 560 562 563 564 565 566 595 9 11 13 12 5 1 1 596 597 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 16Se 626 627 23Se 628 629 630 631 632 633 634 636 637 8 638 639 6 640 641 642 643 644 645 647 648 649 650 651 5 652 6 653 654 676 6 9 9 9 12 6 677 678 680 681 682 683 684 685 687 688 689 690 691 692 693 694 695 696 697 698 700 701 702 703 704 705 706 708 709 710 711 714 715 717 718 719 720 721 722 723 724 725 726 727 728 730 731 732 734 736 737 739 741 742 743 744 745 768 12 12 5 9 4 12 769 770 771 772 773 774 775 776 777 778 779 780 781 783 784 786 787 788 790 791 792 793 794 795 796 798 799 800 801 802 803 804 806 807 809 810 811 812 813 814 815 817 818 819 820 821 822 11 823 9 824 825 826 827 828 829 830 831 832 833 834 835 10 836 837 856 7 5 13 9 10 857 858 859 860 861 862 863 864 866 869 870 873 875 876 880 881 884 886 888 889 890 892 893 894 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 913 914 11 916 917 918 5 919 920 6 921 922 923 6 924 1 925 926 927 928 929 930 931 6 933 934 958 8 SRPRNA 8 8 6 5 9 959 960 961 962 963 964 965 968 969 970 971 972 973 974 975 976 977 979 980 981 983 984 985 986 987 988 990 991 992 993 994 995 997 998 999 1000 1001 1002 1003 1004 1005 1006 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1021 5 1023 1024 1025 1026 1028 1029 1031 1033 12 1053 8 7 11 5 6 1054 1056 1058 1059 1060 1061 1062 1063 1064 1066 1067 1069 1070 1071 1073 1074 1075 1076 1077 1079 1081 1083 1084 1085 1086 1087 1088 1089 1091 1092 1093 1094 1095 1096 1097 1098 1099 1101 1102 1103 1104 1105 1106 1107 1108 1110 1111 1112 1113 1114 1115 10 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 4 1127 1128 1129 1131 1150 12 13 12 11 6 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1188 1190 1194 1195 1196 1197 1198 1199 1200 5 1201 1202 1203 1204 1205 1206 11 1207 1209 1210 1211 1212 1213 1214 1215 10 1216 1217 1219 13 1220 1250 7 7 8 8 6 1252 1253 1255 1256 1257 1258 1259 1260 1261 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1282 1284 1285 1286 1287 1288 1289 1290 1291 1293 1295 1296 1297 1298 1300 1301 14 1302 1303 1304 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1338 6 6 6 7 12 5 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1353 1354 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1369 1370 1371 1372 1373 1374 1375 1376 1377 1379 1382 1384 1386 1388 1390 1391 1392 1394 1397 1399 1400 1401 1402 1403 1405 1406 1424 6 12 5 6 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1437 1438 1439 1441 1442 1444 1445 1446 1447 1448 1450 1451 1454 1457 1458 1460 1463 1465 1467 1469 1470 1473 1475 8 1476 9 1478 1481 1482 1483 1485 13 1486 1488 1490 1491 1492 1509 6 7 11 5 11 10 10 11 1511 1512 1513 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1529 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1557 1558 1560 1561 1562 1563 1565 1566 1567 1568 1570 1571 1573 1575 1577 1579 10 11 1580 1581 1582 1583 1584 11 1585 1587 1588 1607 6 5 7 5 5 1608 1609 1610 1611 1612 1614 1615 1617 1618 1619 1620 1621 1622 1623 1624 1625 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1638 1639 1641 1643 1644 1645 1647 1649 1650 1651 1652 1653 1655 8 1656 1658 1659 1660 12 1663 1664 1 1665 1666 1667 1689 12 10 10 1690 1692 1693 1695 1697 1698 1700 1701 1703 1704 1706 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1725 1726 1727 10 1730 1732 1735 1736 1738 1740 1741 1743 1745 1746 1748 8 1749 9 1750 1751 1753 1754 1755 1756 1757 1758 6 1759 1760 1772 5 10 5 12 5 1773 1774 1775 1776 1777 1778 1779 1781 1782 1783 1784 1786 1788 1789 1791 1794 1797 1798 1799 1800 1803 1805 1806 1807 1808 1811 1814 1815 1817 1819 1820 1821 1822 1824 1825 1826 1827 1828 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 11 1843 1844 1845 1846 1847 1848 7 1849 1850 1851 1852 1853 12 1854 1 1855 1856 1857 1859 1878 1879 1880 1882 1883 1884 1885 1886 1887 1888 1889 1890 1892 1893 1894 1896 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1909 1910 1911 1912 1915 13 1916 1918 10 1920 1921 1922 1923 1925 1926 1927 1928 1929 1931 1933 13 1934 1935 8 1936 1937 1938 1939 1940 6 1941 11 1942 1944 1945 1947 6 1949 2 1971 7 8 8 6 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1983 1984 1985 1986 1987 1989 1990 1991 1992 1993 1994 1995 1997 1998 2000 2001 2002 2003 2004 2006 2007 2008 2009 2011 2012 2013 2014 2015 2016 2017 5 2018 2019 2021 2022 2023 7 2024 7 2026 2027 2028 2030 2031 2032 2033 2035 2036 2037 2038 2039 2041 2067 16 8 2068 2069 2070 2072 2073 2074 2075 2077 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2092 2093 2094 2095 2096 2097 2099 2100 2101 2103 2106 2107 2108 2109 2110 10 2111 2113 2115 2116 2118 2119 2120 1 2121 2123 2126 2127 2128 2129 2130 2131 2132 2133 2135 2136 2137 2140 2141 2142 2143 2144 2178 12 12 6 5 8 11 2179 2180 2181 2182 2183 2184 2185 2187 2188 2190 2191 2192 2193 2194 2195 2196 2197 2198 2201 2202 2203 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2220 2222 10 2223 2224 2225 2226 10 2227 2229 2230 2231 2232 2233 2234 2235 2236 9 2237 2238 2240 2242 2244 2245 2267 2268 6 6 2269 2270 2271 2272 2273 2274 2276 2277 2278 2279 2280 2281 2282 2283 2285 2286 2287 2289 2290 2291 2292 2293 2294 2295 2297 2298 2299 2300 2301 2302 2303 2305 2307 2308 2309 2310 2311 2312 2316 2319 2320 9 2322 10 2323 2324 2329 2330 RNasePRNA 2332 2333 2334 2335 2337 2338 2339 2340 2341 2342 2343 2344 2365 2366 2368 2369 2370 2371 2373 2374 2376 2377 2378 2379 2380 2381 2382 2383 2384 2385 2386 2387 2388 2389 2390 2391 2394 2395 2396 2397 2398 2399 2400 2401 2402 2403 2404 2405 1 2406 23Sf 2407 2409 2 16Sf 2412 1 2413 6 2414 6 2415 2416 2417 2418 2419 2420 2421 2422 2423 2424 2425 2443 5 8 10 2444 2445 2446 2447 2448 2450 2451 2452 2453 2454 2456 2458 2459 2460 2461 2462 2463 2464 2465 2466 2467 2469 2470 2472 2474 2475 2476 2479 2480 2481 2482 2483 2484 2485 2487 2488 6 2489 2490 2491 2492 2493 2494 13 2497 2498 2499 2500 2501 2502 2503 2504 2505 2506 2507 2508 2510 2541 11 8 2542 2544 2545 2547 2548 2549 2550 2552 2553 2554 2555 2557 2558 2559 2560 2561 2562 2563 2564 2565 2566 2567 2568 2569 2571 2572 2576 2577 2579 2581 2584 5 2590 2593 2595 2596 2598 12 2599 5 2600 2601 2602 9 2603 2604 2606 12 2607 11 2608 2609 6 2610 2613 2614 2615 2616 2617 11 2618 2619 2620 2621 2622 2623 2624 2625 2626 2627 2628 2649 23Sg 4 11 12 5 3 16Sh 2651 2653 2654 2655 2656 5Sg 1 2657 2660 2661 2662 16Sg 2664 5Sh 2666 23Sh 2668 2670 2671 2672 2673 2674 2675 2676 2677 2678 2681 2683 2684 2685 2688 2689 2690 2691 2692 2693 2694 2695 2696 2697 2698 2699 2700 2701 2702 2703 2705 2706 2708 2710 2711 2712 2713 2714 2715 2716 2717 2718 2719 2720 2721 2722 2723 2724 2725 2726 2749 2750 2751 2755 2756 2757 2758 2759 2760 2761 2762 2764 2765 2766 2767 2768 2770 2772 2773 2774 2775 5 9 9 13 6 8 8 11 7 5 10 11 11 11 9 9 19 8 6 10 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 47 48 49 50 51 52 17 53 55 56 57 58 59 61 7 63 64 12 65 22 66 67 68 10 70 71 72 73 74 75 76 77 78 79 80 82 83 84 85 86 9 88 90 112 11 12 12 5 9 9 113 114 115 116 117 118 119 120 121 123 124 127 128 129 130 131 132 133 134 135 136 137 139 140 141 142 144 146 147 148 10 150 151 152 154 155 156 157 159 160 161 164 165 12 167 168 169 170 171 172 173 174 175 176 178 179 180 181 205 207 208 210 211 212 213 214 217 218 219 220 221 222 223 225 227 228 229 230 231 232 233 235 237 238 239 240 241 242 243 244 245 246 247 248 249 250 253 254 255 256 257 258 261 262 263 264 265 266 267 268 269 270 271 272 274 275 276 277 278 279 280 310 11 311 313 316 317 318 322 325 328 330 331 334 337 338 340 341 344 345 346 347 349 350 351 354 355 356 358 360 361 363 365 366 5 367 368 369 370 371 372 374 10 375 376 10 379 380 382 12 385 387 388 391 395 396 397 399 400 401 402 405 6 406 408 6 409 410 413 414 415 416 417 7 419 420 5 421 423 424 425 428 431 432 11 433 436 1 439 440 441 442 443 8 483 6 6 11 10 12 485 486 487 490 491 493 495 496 498 501 504 505 506 507 508 510 511 512 513 514 516 517 518 519 520 521 522 523 524 526 527 528 529 530 10 531 532 534 535 536 537 538 12 539 540 542 543 544 545 546 549 550 10 552 554 556 557 558 559 560 561 584 16 5 8 585 586 587 588 589 590 591 592 593 595 596 599 600 601 602 603 604 605 606 607 608 610 612 614 9 615 616 617 618 619 620 621 623 624 625 8 627 628 629 630 631 632 633 5 634 635 637 638 639 640 641 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 661 12 662 663 2 686 10 5 687 688 689 690 691 692 693 694 695 697 699 700 702 703 704 705 706 707 708 709 710 711 712 714 716 717 718 719 720 723 724 725 726 727 728 729 730 732 734 6 735 9 736 737 738 740 741 744 11 745 747 748 749 751 752 753 754 5 780 9 8 6 781 782 783 784 785 786 788 789 790 791 792 793 795 798 799 800 801 802 803 804 805 806 807 808 809 811 812 813 13 814 815 817 818 820 822 9 823 824 825 827 828 829 830 832 833 834 835 6 836 837 838 5 840 843 846 847 9 848 6 849 8 871 14 13 872 873 875 876 877 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 5 896 897 898 900 5 901 902 5 903 904 905 9 906 5 907 908 13 909 910 911 912 913 914 915 917 920 921 922 923 924 925 926 927 928 929 930 931 935 936 937 938 939 6 940 941 943 944 9 945 946 947 975 13 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 996 997 998 999 1000 6 1001 5 1002 1004 1005 1006 6 1008 1010 1011 1012 1013 1015 1017 1018 1019 1020 1023 1025 1026 1027 1028 1029 1031 1033 1034 1036 1037 1038 1039 1040 1041 1043 1045 1046 1047 1048 1072 1073 1074 1075 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1088 1089 1090 1091 1092 1093 1094 1095 1098 1099 1100 1101 1102 1104 1105 1108 1109 1110 1111 1112 1113 1114 1115

articles

1 2

5 6 7

10

12

13

14

15

16

18

19

20

21

22

23 24

103

105

106

108

111

112

113

114

115

116

117

118

119

120

122

125

126

127

19

11

194

195

196

198

199

200

201

202

203

204

206

207

208

209

210

211

212

213

214

14

291

293

295

296 297

298

299

300

302

303

304

305

306

307

308

309

16Sd

23Sd

5Sd

12

12

385

386

389

390

391

392

393

394

395

396

472

473

474

475

476

477

478

480

481

482

483

485

486

487

488

489

13

566

567

568

570 571

573

574

575

576

577

578

579

580

581

582

583

585

586

587

589

590

591

592

593

594

11

655

656 657

658

659

660

661

662

663

664

665

667

668

671

672

673

674

675

746

747

748

749 750 751

752

753

755

756

757

758

759

760

761

762 tmRNA

763

764

766

767

838

839

840

841

842

843

844

845

846

847

848

849

851

852

853

854

855

934

935

936

937

938

939

940

941

942

943

944

947

948

949

950

951

953

954

956

957

1034

1035

1036

1037

1038

1039

1040

1041

1042

1043

1044

1045

1046

1047

1048

1049

1050

12

11

1132

1133

1134

1135

1136

1137

1138

1139

1140

1141

1144

1145

1147

1148

1149

10

1224

1226

1228

1229

1231

1232

1234

1235

1236

1237

1238

1239 1240

1242

1244

1245

1246

1248

1249

12

1317

1318

1319

1320

1321

1323

1325

1327

1328

1329

1330

1332

1333

1334

1335

1336

1337

12

10

1407

1408

1409

1410

1411

1413

1414

1415

1416

1417

1418

1419

1420

1421

1422

1423

1492

1494

1495

1496

1497

1498

1499

1500

1501

1502

1503

1504

1505

1506

1507

1508

14

1589

1590

1591

1592

1593

1594

1595

1596

1597

1598

1599

1600

1601

1602

1603

1604

1605

1606

11

1669

1670

1671

1672

1673

1674

1675

1678

1679

1680

1681

1682

1683

1684

1685

1686

1687

1688

1760

1761

1762

1763

1764

1765

1766

1767

1768

1769

1770

1771

12

1860

1861

1862

1863

1864

1865

1866

1867

1868

1869

1870

1871

1872

1873

1874

1875

1877

11

12

1950

1951

1952

1953

1955

1956

1959

1960

1962 1963

1964

1965

1966

1967

1968

1970

15

2042

2043

2045

2046

2047

2048

2049

2051

2052

2053

2055

2056

2057

2058 2059

2060

2061

2062

2063

2064

2066

12

2145

2146

2152

2153

2156

2157

2159 2160

2161

2162

2164

2166 2167

2168

2171

2172

2174

2175

2176

2245

2246

2247

2248

2249

2250

2251

2252

2253

2254

2255

2256

2257

2258

2259

2260

2261

2262

2265

2266

2344

2345

2346

2347

2348

2349

2350

2352

2353

2355

2356

2358

2359

2360

2362

2363

2364

11

2425

2426

2427

2428

2430

2431

2432

2433

2434 2435

2436

2437

2438

2439

2440

2441

2442

10

12

2510 2511

2513

2514

2517 2518

2519

2520

2522

2523

2524 2525 2527

2528

2529

2531

2532

2534

2535

2536

2537

2538

2539

2000 Macmillan Magazines Ltd


23s rRNA 16s rRNA Repeat region Rho-ind. terminator Membrane protein 5s rRNA Other RNA 3 Three tRNAs

2628 2629

2630

2631 2632 2633

2634

2635

2636

2637

2638

2641

2642

2643

2644

2645

2646

2647

2727

2729 2730

2731

2732

2733

2734

2736

2738

2739

2740

2741

2742

2743

2744

2746

2747

2748

9 10

11

13

14

15

16

17

18

12

91

92

94

95

96

97

98

99

100

101

102

103

104

106

107

108

109

110

111

181

182

183

186

189

190

191

192

193

194

195

196

197

198

199

200

201

202

204

281

282

283

284

285

286

287

288

290

291

293

294

299

300

301

303

307

308

309

447

448

449

450

451

453

454

455

457

458

459

460

463 464

465

468

470

472

474

475

476

479

481

563

564

565

566

567

568

571

572

573

574

575

576

578

580

581

582

583

13

14

12

663

665

666

667

669

671

673

674

675

676 677

678

679

680

681

682

683

684

685

12

756

757

758

759

760

762

763

764

765

766

767

768

769

772

773

774

776

777

778

779

850

851

852

853

854

855

856

859

860

862

863

864

865

867

870

10

949

951

952

954

955

956

957

958

960

961

962

963

964

965

969

971

972

974

12

1049

1050

1051

1052

1053

1054

1055

1056

1057

1058

1059

1060

1062

1063

1067

1068

1069

1071

Amino acid biosynthesis

Central intermediary metabolism

Nucleotides

Protein fate/protein synthesis

1 Kb

Biosynthesis of cofactors, prosthetic groups, and carriers

DNA metabolism

Regulatory functions

Conserved hypothetical

Cell envelope

Energy metabolism

Transcription

No database match

NATURE | VOL 406 | 3 AUGUST 2000 | www.nature.com

Cellular processes

Fatty acid and phospholipid metabolism

Transport and binding proteins

Other categories

Das könnte Ihnen auch gefallen