Microbial Genomes: - 1) Methods For Studying Microbial Genomes - 2) Analysis and Interpretation of Whole Genome Sequences

Microbial Genomes
1) Methods for Studying Microbial Genomes 2) Analysis and Interpretation of Whole Genome Sequences
Methods for Studying Microbial Genomes

Why study genomes? History of genome sequencing Methods, Principles & Approaches
Why study microbial genomes?

until whole genome analysis became viable, life sciences have been based on a reductionist principle dissecting cell and systems into fundamental components for further study studies on whole genomes and whole genome sequences in particular give us a complete genomic blueprint for an organism we can now begin to examine how all of these parts operate cooperatively to influence the activities and behavior of an entire organism a complete understanding of the biology of an organism microbes provide an excellent starting point for studies of this type as they have a relatively simple genomic structure compared to higher, multicellular organisms studies on microbial genomes may provide crucial starting points for the understanding of the genomics of higher organisms
Why study microbial genomes?

analysis of whole microbial genomes also provides insight into microbial evolution and diversity beyond single protein or gene phylogenies in practical terms analysis of whole microbial genomes is also a powerful tool in identifying new applications in for biotechnology and new approaches to the treatment and control of pathogenic organisms
History of microbial genome sequencing

1977 - first complete genome to be sequenced was bacteriophage X174 - 5386 bp first genome to be sequenced using random DNA fragments Bacteriophage - 48502 bp 1986 - mitochondrial (187 kb) and chloroplast (121 kb) genomes of Marchantia polymorpha sequenced early 90s - cytomegalovirus (229 kb) and Vaccinia (192 kb) genomes sequenced 1995 - first complete genome sequence from a free living organism Haemophilus influenzae (1.83 Mb) late 1990s - many additional microbial genomes sequenced including Archaea (Methanococcus jannaschii - 1996) and Eukaryotes (Saccharomyces cerevisiae - 1996)
Microbial genomes sequenced to date

currently there are 32 complete, published microbial genomes 25 domain Bacteria, 5 Domain Archaea, 1 domain Eukarya (www.tigr.org) around 130 additional microbial genome and chromosome sequencing projects underway
Laboratory tools for studying whole genomes

conventional techniques for analysing DNA are designed for the analysis of small regions of whole genomes such as individual genes or operons many of the techniques used to study whole genomes are conventional molecular biology techniques adapted to operate effectively with DNA in a much larger size range
Pulsed Field Gel Electrophoresis

agarose gel electrophoresis is a fundamental technique in molecular biology but is generally unable to resolve fragments greater than 20 kilobases in size (whole microbial genomes are usually greater than 1000 kilobases in size) PFGE (pulsed field gel electrophoresis) is a adaptation of conventional agarose gel electrophoresis that allows extremely large DNA fragments to be resolved (up to megabase size fragments) essential technique for estimating the sizes of whole genomes/chromosomes prior to sequencing and is necessary for preparing large DNA fragments for large insert DNA cloning and analysis of subsequent clones also a commonly used and extremely powerful tool for genotyping and epidemiology studies for pathogenic microorganisms
Principle of PFGE
two factors influence DNA migration rates through conventional gels - charge differences between DNA fragments - molecular sieve effect of DNA pores DNA fragments normally travel through agarose pores as spherical coils, fragments greater than 20 kb in size form extended coils and therefore are not subjected to the molecular sieve effect the charge effect is countered by the proportionally increased friction applied to the molecules and therefore fragments greater than 20 kb do not resolve PFGE works by periodically altering the electric field orientation the large extended coil DNA fragments are forced to change orientation and size dependent separation is re-established because the time taken for the DNA to reorient is size dependent
Principle of PFGE
Principle of PFGE
the most important factor in PFGE resolution is switching time, longer switching times generally lead to increased size of DNA fragments which can be resolved switching times are optimised for the expected size of the DNA being run on the PFGE gel switch time ramping increases the region of the gel in which DNA separation is linear with respect to size a number of different apparatus have been developed in order to generate this switching in electric fields however most commonly used in modern laboratories are FIGE (Field Inversion Gel Electrophoresis) and CHEF (Contour-Clamped Homogenous Electrophoresis)
CHEF Switch Time Electric Field 1 Electric Field 2
+ +
+ +
+
+ +
Preparation of DNA for PFGE

ideally a genomic DNA preparation that contains a high proportion of completely or almost completely intact genome copies would be suitable for PFGE conventional means of DNA preparation are unsuitable for PFGE as mechanical shearing and low-level nuclease activity will result in fragmented DNA with an average size much smaller than an entire microbial genome (usually less than 200 kb in size) the solution to this is to prepare genomic DNA from whole cells in a semisolid matrix (ie. agarose) that eliminates mechanical shearing a very high concentration of EDTA is also used at all times in order to eliminate all nuclease activity

Procedure
1) intact cells are mixed with molten LMT agarose and set in a mold forming agarose plugs 2) enzymes and detergents diffuse into the plugs and lyse cells 3) proteinase K diffuses into plugs and digests proteins 4) if necessary restriction digests are performed in plugs (extensive washing or PMSF treatment is required to remove proteinase K activity) 5) plugs are loaded directly onto PFGE and run

for restriction digests, conventional enzymes are unsuitable as they cut frequently on an entire genome sequence producing DNA fragments that are far too small rare cutter restriction endonucleases cut genomic DNA with far less frequency than conventional restriction enzymes such as HindIII, BamHI etc. many rare cutter REs have 6-bp (or longer) recognition sites eg. NotI GCGGCCGC in many cases the frequency of cutting is highly species dependent eg. BamHI will cut far less frequently on a low GC% genome when compared to a intermediate or high GC content genome suitable rare cutter enzymes therefore have to be determined experimentally for each new species being studied
Large insert cloning vectors BACs and PACs

DNA cloning is another technique fundamental to molecular biology that requires adaptation in order to be useful in studying DNA at a whole genome scale conventional plasmid derived cloning vectors are only able to reliably maintain inserts less than 20 kb in size there are a number of approaches to generating clones with inserts in an intermediate size range (20 80 kB) such as cosmids, etc. the most commonly used vectors for cloning extremely large DNA inserts are BACs (Bacterial Artificial Chromosomes) and PACs (P1derived Artificial Chromosomes) both BAC and PAC vectors are plasmid derived vectors distinguished from conventional vectors by extremely tightly controlled low copy numbers
Large insert cloning vectors BACs and PACs

these very low copy numbers help to limit the strain on host cellular resources generated by very large DNA inserts thus eliminating the rejection of large insert clones low copy numbers also help to limit recombination events with host genomic DNA BAC and PAC vectors both utilise E. coli as the host organism BAC vectors are based on the E. coli single copy F-factor plasmid the F-factor origin of replication is very tightly controlled PAC vectors are based on an identical principle but instead use a single copy origin of replication derived from P1 phage PAC vectors also contain a pUC19 cassette for improved vector purification
Approaches to whole microbial genome sequencing

aim of microbial genome sequencing projects is to construct, from 500 800 bp sequencing reads containing about 1% mistakes, a genome sequence of several megabases with an error rate lower than 1 per 10000 nucleotides with improving software, decreasing computation costs and advancements in automated DNA sequencing, an entire microbial genome project can be completed in a small laboratory in 1-2 years there are two main approaches to sequencing microbial genomes the ordered clone approach and direct shotgun sequencing both require both large and small insert genomic DNA libraries in order to be effective
Approaches to whole microbial genome sequencing
Ordered Clone Approach

essentially this technique involves constructing a map of overlapping large insert clones covering the whole genome and then completely sequencing the minimum subset of these ordered clones there are a number of methods used to order clones including restriction fingerprinting and hybridisation mapping once an ordered large insert clone set is identified, a whole genome sequence is determined by either shotgun or partial primer walk sequencing of each insert the ordered clone approach to DNA sequencing requires a large amount of characterisation prior to actual DNA sequencing and is therefore a relatively time consuming approach, however, it may be cheaper than shotgun sequencing an entire genome as less redundant sequencing is required with rapid decreases in costs for computing power and sequencing this method is no longer considered viable for small (< 5 Mb) genomes
Large DNA fragment Digest and subclone
Whole Genome Randomly sequence fragments
Fill gaps
Repeat for entire genome map
Random sequencing (shotgun) approach

this is the currently the most commonly used strategy for microbial whole genome sequencing sequences from both ends of a large number of small and large insert clones are generated and overlapping sequences joined together to form a contig of the whole genome sequence (whole inserts not sequenced) although this requires enormous amounts of DNA sequencing (often up to 10x genome coverage) and computational power for sequence assembly, it is a relatively rapid approach to whole genome sequencing the first 90 95% of the genome sequence is relatively easy to generate by shotgun sequencing resulting in several hundred discrete contigs filling the gaps to produce a single contig is the most difficult and time consuming phase of this process
Whole Genome
Shear and subclone
Randomly sequence fragments
Fill gaps
Random sequencing (shotgun) approach

There are a number of steps in the process 1) Random large and small insert library construction 2) High throughput DNA sequencing 3) Sequence assembly 4) Ordering of contigs 5) Primer walking to complete sequence 6) Annotation
Library construction
Both conventional and large insert genomic DNA libraries should be constructed the small insert library will be used for the bulk of the sequencing in order to generate suitable coverage of the complete genome the large insert library (BAC, PAC, cosmid etc.) will be used as a scaffold during the sequence closure phase it is crucial to ensure that both libraries are as random as possible mechanical shearing is often used to generate small DNA fragments it is also important that each clone contains only one DNA fragment and as such specialised methods for library construction must be used
DNA Sequencing
DNA sequences are generated using vector primers for both ends of inserts at least 6X coverage of the genome is required although 9 to 10X coverage is often generated
Sequence assembly and gap closure

4 major steps in sequence assembly and gap closure 1) random sequences initially interpreted using highly accurate base calling software and assembled to generate primary contigs using software such as PHRAPP 2) computational and experimental techniques used to identify linking clones and order primary contigs 3) primer walk sequencing of linking clones and PCR products to fill sequence gaps between contigs 4) confirmation of contig order by PCR
Linking Clones
one of the most effective means of contig ordering and gap filling is linking clones linking clones are those whose terminal sequences (from either end of the insert) belong to different contigs if the orientation of the sequences and the distance to the end of the contig are compatible with with the size of the insert, the two contigs are likely to be linked the larger the insert the more likely a clone will be a linking clone this is why random sequencing is also performed on large insert clones - they are far more likely to form linking clones
Contig 1 Gap
Contig 2
Random Sequencing
Random Sequencing
Contig 1
Large Insert Linking Clone
Contig 2
FWD
REV
Once all possible linking clones are identified gaps are classified into two categories - those with linking clones (template available for sequencing) and physical gaps without linking clones ( no DNA template for the region) for those gaps with suitable linking clones, the gaps confirmed by PCR and closed by primer walk sequencing
Contig 1
Large insert Linking Clone
Contig 2
FWD
REV
Primer Walking
Physical Gaps
Contigs separated with physical gaps (no linking clones) are usually spanned by PCR on genomic DNA using primers from each end of the contigs the PCR products can then be sequenced to close the gaps without linking clones other techniques to order contigs must be used in order to guide the selection of PCR products
Linking clone
For those contigs without linking clones, how do you fill the gaps?
Supercontig 1
Supercontig 2
Supercontig 3
Physical Gaps
contigs can be ordered by peptide linking - contig ends having regions with homology to the same gene (or operon / gene cluster) southern hybridisation of labelled contig terminal oligonucleotides against large restriction fragments
Linked by Southern Hybridisation
Contig 2
PCR Product
Contig 6
FWD
REV
Primer Walking
Gapped Microbial Genomes

considering the cost and difficulty in filling gaps between contigs some interest has been generated by the analysis of gapped microbial genomes each gap is usually very small on average (approximately 75 bp for a 3.2x coverage library) increasing bioinformatic resources available mean that these gaps have little influence on functional reconstruction eg. Thiobacillus ferroxidans - all assigned amino acid biosynthesis genes (140 in total) identified from a gapped genome of 1912 contigs error rates tend to be relatively high compared to genome sequences with greater coverage
Example - Haemophilus influenzae

first complete genome sequence of a free living organism (1995) important pathogen genome is around 1.83 megabases in size random sequencing was done for both small insert and large insert (lambda) libraries sequencing reactions performed by eight individuals using fourteen ABI 377 DNA sequencers per day over a three month period in total around 33000 sequencing reactions were performed on 20000 templates plasmid extraction performed in a 96 well format 11 mb of sequence was intially used to generate 140 contigs gaps were closed by lambda linking clones (23), peptide links (2), Southern analysis (37) and PCR (42)
Annotation of Genome Sequences

a microbial genome sequence alone is only raw data it needs to be interpreted in order to be of any scientific significance the process of predicting the location and function of all possible coding sequences (genes) in a genome sequence is known as annotation although an annotated genome sequence provides a large amount of important information it is still merely a starting point for completely characterising an organism Genome Databases: Listing of genomes: Genomes online database (GOLD) Comprenehsive Genome Databases : GenBank, EMBL, DDBJ, JGI, TIGR, HAMAP, IMG, PEDANT (curation problematic, special tools for exploring / comparing / aligning genomes) Taxon specific Genome Database: EcoCyc (literature derived annotations)
Genome Annotation
The process after sequencing has been completed.
Use of many different tools required: Bioinformatics Databases Literature Sequence Experimental
Pipelines for the Annotation of Genomes Web based
Sequence Gene prediction
Proteins
Similarity searches against reference databases Calculations & predictions (MW , structure, location etc)
Annotated Proteins Pathway prediction
Annotated Proteins & pathways Data visualization
Manual editing
Pipeline for genome annotation
Identifying ORFs
most genomes will contain genes with very little or no homology to known genes of other organisms for this reason all of the possible ORFs need to be identified without relying totally on homology most efficient means for identifying potential genes in genome sequences is a three step process 1) submit entire sequence as a 6-frame translation for BLAST analysis in order to identify some protein coding regions on the basis of high levels of homology 2) use these initial coding regions to determine the sequence characteristics (GC content, codon bias etc.) that distinguish coding and non-coding regions of the genome (training the software
Identifying ORFs
3) reanalyse the genome sequence using this data (plus potential ribosome binding sequences) in order to identify all the potential genes using this process it has been experimentally shown that around 94% of genes can be accurately predicted algorithms are also available to identify ORFs without using the training procedure with only slightly reduced accuracy GLIMMER is a software for gene prediction and used by: BASYS- http://wishart.biology.ualberta.ca/basys/cgi/submit.pl JCVI (formerly TIGR)- http://www.tigr.org/ SABIA- http://www.sabia.lncc.br/
Assigning function to ORFs

in order to assign function, all predicted ORFs are translated to amino acid sequence and analysed by homology searches against sequence databases (usually Genbank) for each ORF there are three possible results i) clear sequence homology indicating function ii) blocks of homology to defined functional motifs - these should be confirmed experimentally iii) no significant homology or homology to proteins of unknown function
ORFs of unidentified function

in most genome sequences many of the ORFs identified cannot be assigned a specific function based on homology although the figure varies, usually between 40 and 50% of ORFs fall into this category clearly this represents a significant gap in our knowledge of microbial metabolism these ORFs can be further divided into two categories i) conserved hypothetical proteins ORFs with no homology to proteins of known function but with significant homology to unidentified ORFs of other species these ORFs are therefore functionally conserved across numerous species and may represent important components of central metabolism that have not yet been identified
ORFs of unidentified function

the more universal the distribution of these ORFs the more likely they have a fundamental role in metabolism ii) ORFs without homologues these are ORFs that have no homology to any known sequences these may represent genes encoding proteins related to more specific organism adaptations eg. Deinococcus radiodurans is a radiation resistant organism that contains many ORFs without homologues many of these are thought to be involved in specialised processes of DNA repair
Organism (total ORFs)
Homologues to known proteins (%)
E. coli (4277) Pyrococcus horikoshii (2064) Haemophilus influenzae (1709) B. subtilis (4099) Methanococcus jannaschii (1735)
33.3 35 58.8 58 38.1
Homologues to conserved hypothetical proteins (%) 10.3 33.3 18.2 5 40.6
No homologues (%)
56.4 31.7 23 37 21.3
In addition to the 20 amino acids, two new but rare amino acids have now been identified:
21st selenocystine (Sec) 22nd pyrolysin (Pyl)
ORF identification and new amino acids
The Sec and Pyl containing proteins are predominantly found in members of class -proteobacteria, phylum Proteobacteria.
Metagenome analysis of the uncultured -proteobacteria of the gutless & mouthless worm, Olavius algarvensi, contains the highest proportion of Sec & Pyl containing proteins to date suggesting that symbiosis promotes Sec & Pyl genetic code
code- 63 out of the 64 possible codons
Olavius algarvensi, also contains the most wide use of the genetic
Sec: 99 genes, cluster into 30 protein families present in domains Bacteria, Archaea & Eucarya. Sec coded by UGA (UGA also acts as a stop codon)
ORF identification and new amino acids

Reference:
Zhang, Y. and Gladyshev, V. N. (2007). High content of proteins containing 21st and 22nd amino acids, selenocysteine and pyrrolysine, in a symbiotic deltaproteobacterium of gutless worm Olavius algarvensis. Nucleic Acids Research 2007, 112
Structural genomics
in order to gain a complete understanding of an organism and fully exploit the potential offered by microbial genome sequencing, it is essential that these unidentified ORFs are assigned function in most cases classical molecular biology tools will be necessary for this task, however, some suggestion of function for these ORFs would greatly improve the efficiency of this process one possibility is structural genomics this is the process of determining three dimensional structures of all the gene products encoded in a microbial genome (1000s of structures!!) function can then be inferred on the basis of 3d structure comparisons to other proteins this relies on the principle that structure determines functions and although two proteins with similar amino acid sequences can be assumed to have similar structures, two proteins with similar structure dont necessarily have the same aa sequence
Microarray hybridisation
a completely annotated microbial genome sequence, whilst a powerful scientific tool, still doesnt provide all of the information needed to understand the complete biology of an organism as it essentially a static picture of the genome for truly complete characterisation, the dynamic nature of gene expression within a microbial cell needs to be determined microarray technology allows whole organism gene expression to be investigated PCR products of every gene from a complete genome sequence are bound in a high density array on a glass slide these arrays are probed with fluorescently labelled cDNA prepared from whole RNA under specific environmental conditions the level of cDNA for each ORF is then quantified using high resolution image scanners
Microarray hybridisation
example a microarray containing 97% of the predicted ORFs from Mycobacterium tuberculosis was used to investigate the response to the antituberculosis drug isoniazid (INH) INH was found to induce several genes related to outer lipid envelope biosynthesis consistent with the drugs physiological mode of action a number of additional genes were also induced which may provide potential drug targets in the future
INH untreated - green

INH treated - red Overlay Yellow = Red + Green (no change in expression)
Green = only expressed without INH treatment

Red = only expressed after INH treatment
Characteristics of sequenced genomes

the 32 complete genome sequences currently available cover a diverse range in terms of phylogeny and environments (eg. human pathogens, plant pathogens, extremophiles etc.) what conclusions can be made by comparing the genomes of these organisms regarding specific adaptations to proliferation in remarkably different environments? What conclusions can be made about evolutionary relationships between these organisms?
Horizontal gene transfer

before microbial genome sequences became available most of the focus of microbial evolution was on vertical transmission of genetic information mutation recombination and rearrangement within the clonal lineage of a single microbial population genome sequences have demonstrated that horizontal transfer of genes (between different types of organisms) are widespread and may occur between phylogentically diverse organisms generally speaking, essential genes (such as 16S rRNA) are unlikely to be transferred because the potential host most likely already contains genes of this type that have co-evolved with the rest of its cellular machinery and and cannot be displaced genes encoding non-essential cellular processes of potential benefit to other organisms are far more likely to be transferred (eg. those involved in catabolic processes)
Horizontal gene transfer

clearly, lateral transfer of genomic information has enormous potential in improving an microorganisms ability to compete effectively - this may explain why horizontally transferred genes appear so frequently and ubiquitously in microbial genomes an example of this is horizontally transferred genes between Archaeal and Bacterial hyperthermophiles Thermotoga maritima has 15 clusters of genes (4-20kb) most similar to equivalent Archaeal hyperthermophile gene regions
Whole genome phylogenetic analysis

most of the evolutionary relationships between microorganisms are inferred by comparison of single genes usually 16s rRNA genes although extremely effective, single gene phylogenetic trees only provide limited information which can make determining broad relationships between major groups difficult phylogenetic relationships can be determined by whole genome comparisons of the observed absence or presence of protein encoding gene families in effect this is similar to using the distribution of morphological characteristics to determine phylogeny without the problem of convergent evolution trees produced using this method are similar to 16s rRNA trees, however, as more genome sequences become available more detailed conclusions can be drawn using this method
Archaeal Genomes
analysis of the 5 complete genome available for members of the domain Archaea has provided new insights into relationships between Archaea, Bacteria and Eukaryotes around 35% of the Archaeal genes form a stable core conserved throughout the domain most of these encode proteins involved in transcription, translation and DNA metabolism and some central metabolic pathways the remainder of the genome is classified as a variable shell a relatively high proportion of the variable shell genes are most homologous to their bacterial counterparts - this suggests horizontal gene transfer events a relatively high proportion of the stable core genes are most similar to Eukaryotic genes
A - Stable core
B - Variable shell
Species and strain specific genetic diversity

although genome sequencing and analysis is very useful when comparing phylogenetically distant taxa, it is also of interest to examine the genomes of very closely related microorganisms this allows a more quantitative approach for examining the relationships between genotype and phenotype complete genome sequences have been determined for two species of the genus Chlamydia (pneumoniae and trachomatis) although the overall genome structure was quite similar, C.pneumoniae contained an additional 214 genes most of which have an unknown function two strains of the bacterium Helicobacter pylori have been completely sequenced (26695 and J99) overall the two strains were very similar genetically with only 6% of genes being specific to each strain
Case study - Deinococcus radiodurans

D. radiodurans R1 is an extremely radiation resistant bacterium genome (total of 3.3 megabases) consists of two chromosomes (2.6 and 0.4 mb) a megaplasmid (177 kb) and a small plasmid (44 kb) considerable genetic redundancy was observed in both the chromosomal and plasmid sequences numerous systems for DNA repair, DNA damage export were identified a significant proportion of the ORFs identified had no database matches - these may be involved in unique cellular adaptations to radiation and stress response
Case study - Neisseria meningitits

N. meningititis causes bacterial meningitis and is therefore an important pathogen genome is 2.2 megabases in size 2121 ORFs were identified with many having extremely variable G+C% (recently acquired genes) many of these recently acquired genes are identified as cell surface proteins there is a remarkable abundance and diversity of repetitive DNA sequences nearly 700 neisserial intergenic mosaic elements (NIMEs) - 50 to 150 bp repeat elements these repeat elements may be involved in enhancing recombinase specific horizontal gene transfer
Case study - Borellia burgdorferi

B. burgdorferi is a spirochaete which causes Lyme disease it has a 0.91 megabase linear genome and at least 17 linear and circular plasmids which total 0.53 megabases 853 predicted ORFs identified - these encode a basic set of proteins for DNA replication, transcription, translation and energy metabolism no genes encoding proteins involved in cellular biosynthetic reactions were identified - appears to have evolved via gene loss from a more metabolically competent precursor there is significant amount of genetic redundancy in the plasmid sequences although a biological role has not been determined it is possible the these plasmids undergo frequent homologous recombination in order to generate antigenic variation in surface proteins
Summary
Microbial genome sequencing and analysis is a rapidly expanding and increasingly important strand of microbiology important information about the specific adaptations and evolution of an organism can be determined from genome sequencing however, genome sequencing merely a strong starting point on road to completely understanding the biology of microorganisms further characterisation of ORFs of unknown function, in combination with gene expression analysis and proteomics is required

Microbial Genomes: - 1) Methods For Studying Microbial Genomes - 2) Analysis and Interpretation of Whole Genome Sequences

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Microbial Genomes: - 1) Methods For Studying Microbial Genomes - 2) Analysis and Interpretation of Whole Genome Sequences

Hochgeladen von

Copyright:

Verfügbare Formate

Microbial Genomes

Methods for Studying Microbial Genomes

Why study microbial genomes?

Why study microbial genomes?

History of microbial genome sequencing

Microbial genomes sequenced to date

Laboratory tools for studying whole genomes

Pulsed Field Gel Electrophoresis

CHEF Switch Time Electric Field 1 Electric Field 2

Preparation of DNA for PFGE

Preparation of DNA for PFGE

Preparation of DNA for PFGE

Large insert cloning vectors BACs and PACs

Large insert cloning vectors BACs and PACs

Approaches to whole microbial genome sequencing

Approaches to whole microbial genome sequencing

Ordered Clone Approach

Large DNA fragment Digest and subclone

Whole Genome Randomly sequence fragments

Repeat for entire genome map

Random sequencing (shotgun) approach

Shear and subclone

Randomly sequence fragments

Random sequencing (shotgun) approach

Sequence assembly and gap closure

Large Insert Linking Clone

Large insert Linking Clone

Linked by Southern Hybridisation

Gapped Microbial Genomes

Example - Haemophilus influenzae

Annotation of Genome Sequences

Pipelines for the Annotation of Genomes Web based

Sequence Gene prediction

Annotated Proteins Pathway prediction

Annotated Proteins & pathways Data visualization

Pipeline for genome annotation

Assigning function to ORFs

ORFs of unidentified function

ORFs of unidentified function

Organism (total ORFs)

Homologues to known proteins (%)

33.3 35 58.8 58 38.1

Homologues to conserved hypothetical proteins (%) 10.3 33.3 18.2 5 40.6

56.4 31.7 23 37 21.3

ORF identification and new amino acids

ORF identification and new amino acids

INH untreated - green

Green = only expressed without INH treatment

Characteristics of sequenced genomes

Horizontal gene transfer

Horizontal gene transfer

Whole genome phylogenetic analysis

Species and strain specific genetic diversity

Case study - Deinococcus radiodurans

Case study - Neisseria meningitits

Case study - Borellia burgdorferi

Das könnte Ihnen auch gefallen