0 Bewertungen0% fanden dieses Dokument nützlich (0 Abstimmungen)
424 Ansichten1 Seite
Circular genomes, such as viruses, bacteria, mitochondria and plasmids, are common. However, assembly of such genomes can be difficult in the absence of a reference genome, as most de novo assemblers do not account for circularity and produce linear sequences with an arbitrarily defined start and end.
This can result in repeated sections of sequence at the arbitrary start and end points, and an artificial drop in coverage in these regions which can affect downstream analyses.
The Geneious de novo assembler overcomes these issues by allowing contigs to circularize during the assembly process.
In this study we assemble two mitochondrial genomes from short -read NGS sequence data using the Geneious de novo assembler and compare the results with assembles produced by Velvet, MIRA and Spades.
Originaltitel
De novo Assembly of Circular Genomes using Geneious R7
Circular genomes, such as viruses, bacteria, mitochondria and plasmids, are common. However, assembly of such genomes can be difficult in the absence of a reference genome, as most de novo assemblers do not account for circularity and produce linear sequences with an arbitrarily defined start and end.
This can result in repeated sections of sequence at the arbitrary start and end points, and an artificial drop in coverage in these regions which can affect downstream analyses.
The Geneious de novo assembler overcomes these issues by allowing contigs to circularize during the assembly process.
In this study we assemble two mitochondrial genomes from short -read NGS sequence data using the Geneious de novo assembler and compare the results with assembles produced by Velvet, MIRA and Spades.
Circular genomes, such as viruses, bacteria, mitochondria and plasmids, are common. However, assembly of such genomes can be difficult in the absence of a reference genome, as most de novo assemblers do not account for circularity and produce linear sequences with an arbitrarily defined start and end.
This can result in repeated sections of sequence at the arbitrary start and end points, and an artificial drop in coverage in these regions which can affect downstream analyses.
The Geneious de novo assembler overcomes these issues by allowing contigs to circularize during the assembly process.
In this study we assemble two mitochondrial genomes from short -read NGS sequence data using the Geneious de novo assembler and compare the results with assembles produced by Velvet, MIRA and Spades.
De novo Assembly of Circular Genomes using Geneious R7
Circular genomes, such as viruses, bacteria, mitochondria and
plasmids, are common. However, assembly of such genomes can be diffcult in the absence of a reference genome, as most de novo assemblers do not account for circularity and produce linear sequences with an arbitrarily defned start and end. This can result in repeated sections of sequence at the arbitrary start and end points, and an artifcial drop in coverage in these regions which can affect downstream analyses. The Geneious de novo assembler overcomes these issues by allowing contigs to circularise during the assembly process. In this study we assemble two mitochondrial genomes from short -read NGS sequence data using the Geneious de novo assembler and compare the results with assembles produced by Velvet, MIRA and Spades. Introduction Datasets for the Asiatic Lion 1 (Panthera leo persica) and the Chimpanzee 2 (Pan troglodytes) were downloaded from the NCBI Short Read Archive (Accession numbers SRR821548 and ERR032959, respectively). The Panthera leo dataset consists of unpaired Ion Torrent reads from a purifed mitochondrial DNA preparation. Prior to assembly adaptors and poor quality bases were trimmed off, and reads less than 50 bp were removed from the to leave 237,432 reads of 50-367 bp (mean 164). The Pan troglodytes dataset is from whole-genome shotgun sequencing (approximately 1 x coverage), and consists of paired 76bp Illumina GAII reads with 250 bp insert length. Reads were quality trimmed prior to assembly. This dataset contains a total of 57,237,068 reads, but only 5% were assembled as the mitochondrial fraction was expected to be at much higher coverage than the nuclear fraction. Assemblies were performed using Geneious (version 7.1.5), Velvet 3 , MIRA 4 and SPAdes 5 . Velvet and MIRA were run as plugins to Geneious. Optimal parameters for Velvet were chosen by Velvet Optimizer to maximise the length of the longest contig. The following settings were used for each dataset: Panthera leo: Geneious: Med/High sensitivity, circularize contigs option on MIRA: Genome /contiguous sequence, accurate quality, Ion torrent setting Velvet: Optimal kmer 57 SPAdes: kmers 21, 33, 55, 77, 99, read correction on, Ion torrent setting Pan troglodytes: Geneious: Med/Low sensitivity, circularize contigs option on MIRA: Genome /contiguous sequence, accurate quality, Illumina setting Velvet: Optimal kmer 47 SPAdes: kmers 21, 33, 55, read correction on Contigs produced from each assembly were mapped back to published mitochondrial genome sequences for each species using the Geneious read mapper with medium sensitivity settings and no fne tuning. Methods 1. Assembly of unpaired Ion Torrent reads (Panthera leo) The Geneious R7 assembler produced a single, circular contig containing the entire mitochondrial genome from a dataset of unpaired Ion torrent reads (Figure 2). Although this dataset was from purifed mtDNA, a large number of short linear contigs were also produced (not shown), indicating a signifcant level of nuclear contamination. By contrast, none of the other assemblers could assemble the mitochondrial genome into a single contig (Table 1). The Geneious assembly shows good agreement with the published genome (Figure 3), apart from a few positions where it is impossible to call the length of homopolymer runs due to the Ion Torrent error model, and the control region, where low coverage makes it diffcult to resolve repetitive regions. Results 2. Assembly of paired Illumina reads from WGS sequencing (Pan troglodytes) Geneious, Velvet and SPAdes produced a single contiguous fragment representing the mitochondria (Table 2). However, the Velvet and SPAdes contigs are not circular and are 45bp and 61bp longer respectively, than the Geneious contig because of a repeated section of sequence at the start and end. When mapped to the circular reference genome this produces a region of double coverage (Figure 4). Conclusions The Geneious R7 de novo assembler is the only assembler able to produce circular contigs as part of the assembly process, facilitating the analysis of circular genomes such as mitchondria, chloroplasts, bacterial chromosomes, plasmids and viruses. Geneious also contains a circular mapper, which allows easy comparison of de novo assembly results with published genomes. 1. Bagatharia SB, Joshi MN, Pandya RV et al., (2013) Complete mitogenome of asiatic lion resolves phylogenetic status within Panthera. BMC Genomics 14: 572. 2. Prfer K, Munch K, Hellmann I et al., (2012). The bonobo genome compared with the chimpanzee and human genomes. Nature 486(7404):527-31. 3. Zerbino DR and Birney E (2008). Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Research 18:821-829. 4. Chevreux et al. (1999) Genome sequence assembly using trace signals and additional sequence information. Computer Science and Biology: Proceedings of the German Conference on Bioinformatics (GCB) 99, pp. 45-56. 5. Nurk S., Bankevic A., Antipov D., (2013) Assembling genomes and mini-metagenomes from highly chimeric reads. Lecture Notes in Computer Science Volume 7821, 2013, pp 158-170. Assembler No. mtDNA contigs Genome Coverage (no. of contigs) % Identical Sites Pairwise % Identity Geneious 3 100% (1 contig) 98.3 98.3 Velvet 56 84.5% (48 reads) 99.4 99.2 MIRA 11 99.6% (4 reads) 96.3 96.5 SPAdes 5 99.7% (3 reads) 95.7 95.5 Assembler No. mtDNA contigs Genome Coverage (no. of contigs) % Identical Sites Pairwise % Identity Geneious 39 100% (1 contig) 97.7 97.7 Velvet 1 100% (1 contig) 97.7 97.7 MIRA 35 99.9% (2 contigs) 97.7 97.7 SPAdes 1 100% (1 contig) 97.7 97.6 Figure 2: Circular contig produced by de novo assembly in Geneious R7 Figure 3: Mapping of circular contig by Geneious R7 to published Panthera leo genome Table 1: Comparison of Panthera leo Assemblies Figure 4: Mapping of Pan troglodytes contigs to published genomes. (A) shows the circular contig produced by Geneious, (B) shows the overlapping regions of Velvet contig caused by the linear assembly Table 2: Comparison of Pan troglodytes Assemblies Figure 1: Geneious de novo assembly set up options, showing the circularise contigs option. Hilary Miller and Matt Kearse Biomatters Ltd, Auckland, New Zealand