Sie sind auf Seite 1von 1

De novo Assembly of Circular Genomes using Geneious R7

Circular genomes, such as viruses, bacteria, mitochondria and


plasmids, are common. However, assembly of such genomes
can be diffcult in the absence of a reference genome, as most de
novo assemblers do not account for circularity and produce linear
sequences with an arbitrarily defned start and end.
This can result in repeated sections of sequence at the arbitrary start
and end points, and an artifcial drop in coverage in these regions
which can affect downstream analyses.
The Geneious de novo assembler overcomes these issues by
allowing contigs to circularise during the assembly process.
In this study we assemble two mitochondrial genomes from short
-read NGS sequence data using the Geneious de novo assembler
and compare the results with assembles produced by Velvet, MIRA
and Spades.
Introduction
Datasets for the Asiatic Lion
1
(Panthera leo persica) and the Chimpanzee
2
(Pan
troglodytes) were downloaded from the NCBI Short Read Archive (Accession
numbers SRR821548 and ERR032959, respectively).
The Panthera leo dataset consists of unpaired Ion Torrent reads from a purifed
mitochondrial DNA preparation. Prior to assembly adaptors and poor quality
bases were trimmed off, and reads less than 50 bp were removed from the to
leave 237,432 reads of 50-367 bp (mean 164).
The Pan troglodytes dataset is from whole-genome shotgun sequencing
(approximately 1 x coverage), and consists of paired 76bp Illumina GAII reads
with 250 bp insert length. Reads were quality trimmed prior to assembly. This
dataset contains a total of 57,237,068 reads, but only 5% were assembled as
the mitochondrial fraction was expected to be at much higher coverage than the
nuclear fraction.
Assemblies were performed using Geneious (version 7.1.5), Velvet
3
, MIRA
4
and
SPAdes
5
. Velvet and MIRA were run as plugins to Geneious. Optimal parameters
for Velvet were chosen by Velvet Optimizer to maximise the length of the longest
contig. The following settings were used for each dataset:
Panthera leo:
Geneious: Med/High sensitivity, circularize contigs option on
MIRA: Genome /contiguous sequence, accurate quality, Ion torrent setting
Velvet: Optimal kmer 57
SPAdes: kmers 21, 33, 55, 77, 99, read correction on, Ion torrent setting
Pan troglodytes:
Geneious: Med/Low sensitivity, circularize contigs option on
MIRA: Genome /contiguous sequence, accurate quality, Illumina setting
Velvet: Optimal kmer 47
SPAdes: kmers 21, 33, 55, read correction on
Contigs produced from each assembly were mapped back to published
mitochondrial genome sequences for each species using the Geneious read
mapper with medium sensitivity settings and no fne tuning.
Methods
1. Assembly of unpaired Ion Torrent reads (Panthera
leo)
The Geneious R7 assembler produced a single, circular contig
containing the entire mitochondrial genome from a dataset of
unpaired Ion torrent reads (Figure 2). Although this dataset was
from purifed mtDNA, a large number of short linear contigs
were also produced (not shown), indicating a signifcant level
of nuclear contamination. By contrast, none of the other
assemblers could assemble the mitochondrial genome into a
single contig (Table 1).
The Geneious assembly shows good agreement with the
published genome (Figure 3), apart from a few positions where
it is impossible to call the length of homopolymer runs due to
the Ion Torrent error model, and the control region, where low
coverage makes it diffcult to resolve repetitive regions.
Results
2. Assembly of paired Illumina reads from WGS
sequencing (Pan troglodytes)
Geneious, Velvet and SPAdes produced a single contiguous
fragment representing the mitochondria (Table 2). However, the
Velvet and SPAdes contigs are not circular and are 45bp and
61bp longer respectively, than the Geneious contig because
of a repeated section of sequence at the start and end. When
mapped to the circular reference genome this produces a region
of double coverage (Figure 4).
Conclusions
The Geneious R7 de novo assembler is the only assembler able to produce circular contigs as part of the assembly process, facilitating the analysis of circular
genomes such as mitchondria, chloroplasts, bacterial chromosomes, plasmids and viruses. Geneious also contains a circular mapper, which allows easy comparison
of de novo assembly results with published genomes.
1. Bagatharia SB, Joshi MN, Pandya RV et al., (2013) Complete mitogenome of asiatic lion resolves phylogenetic status within Panthera. BMC Genomics 14: 572.
2. Prfer K, Munch K, Hellmann I et al., (2012). The bonobo genome compared with the chimpanzee and human genomes. Nature 486(7404):527-31.
3. Zerbino DR and Birney E (2008). Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Research 18:821-829.
4. Chevreux et al. (1999) Genome sequence assembly using trace signals and additional sequence information. Computer Science and Biology: Proceedings of the German Conference on Bioinformatics (GCB)
99, pp. 45-56.
5. Nurk S., Bankevic A., Antipov D., (2013) Assembling genomes and mini-metagenomes from highly chimeric reads. Lecture Notes in Computer Science Volume 7821, 2013, pp 158-170.
Assembler No. mtDNA
contigs
Genome Coverage (no.
of contigs)
% Identical
Sites
Pairwise %
Identity
Geneious 3 100% (1 contig) 98.3 98.3
Velvet 56 84.5% (48 reads) 99.4 99.2
MIRA 11 99.6% (4 reads) 96.3 96.5
SPAdes 5 99.7% (3 reads) 95.7 95.5
Assembler No. mtDNA
contigs
Genome Coverage (no.
of contigs)
% Identical
Sites
Pairwise %
Identity
Geneious 39 100% (1 contig) 97.7 97.7
Velvet 1 100% (1 contig) 97.7 97.7
MIRA 35 99.9% (2 contigs) 97.7 97.7
SPAdes 1 100% (1 contig) 97.7 97.6
Figure 2: Circular contig produced by de novo assembly in Geneious R7
Figure 3: Mapping of circular contig by Geneious R7 to published Panthera leo genome
Table 1: Comparison of Panthera leo Assemblies
Figure 4: Mapping of Pan troglodytes contigs to published genomes. (A) shows the circular
contig produced by Geneious, (B) shows the overlapping regions of Velvet contig caused by
the linear assembly
Table 2: Comparison of Pan troglodytes Assemblies
Figure 1: Geneious de novo assembly set up options, showing the circularise
contigs option.
Hilary Miller and Matt Kearse
Biomatters Ltd, Auckland, New Zealand

Das könnte Ihnen auch gefallen