Massively Parallel Sequencing For Biodiversity

Massively parallel sequencing for biodiversity science
Annie Archambault, Centre de la Science de la Biodiversit du Qubec
qcbs.ca
April 2013
Synonyms:
Next generation sequencing (NGS), Massively parallel sequencing, High throughput sequencing, 2nd or 3rd generation sequencing
Parallelize the sequencing process :

Producing thousands of short sequencing reads at once REPLACES CLONING AND CLONE SCREENING REPLACES INDIVIDUAL SEQUENCING REACTIONS
Outline Uses for biodiversity studies

Very brief review of the 4 main platforms
Examples of experimental procedures strategies (complexity reduction and multiplexing) Laboratory steps and costs for 4 cases studies Disclaimer:
I still have limited experience with these instruments, I gained understanding from intensive readings
Useful reading
Review of the chemistry and the workflow :
Myllykangas S, Buenrostro J, Ji HP: Overview of Sequencing Technology Platforms. In Bioinformatics for High Throughput Sequencing. Springer New York; 2012: 1125.
http://www.springerlink.com/content/n6u33m1335750g57/
Review of technologies and applications in biodiversity :

Purdy KJ, Hurd PJ, Moya-Larao J, Trimmer M, Oakley BB, Woodward G: Systems Biology for Ecology: From Molecules to Ecosystems. In Advances in Ecological Research. 2010: 87149. http://linkinghub.elsevier.com/retrieve/pii/B9780123850058000034
Instruments comparison
Platform Amplification, detection Detection Fluorescence Fluorescence Step At GQ Innovation Center Yes Yes Unit
GS - FLX+ (454)
HiSeq (Illumina)
Pyrosequencing emulsion PCR

BridgePCR Wash after every base
During synthesis
During synthesis
1 plate (divided in )
Flow cell of 8 lanes
Ion PGM Sequencer Emulsion PCR. Pyrosequencing(LifeTechnolo like gies) 314, 316 chip No prior amplification. PacBio RS Single-molecule Real-time sequencing (smrt)
H+ ions
During synthesis
Yes
Chip
Fluorescence
During synthesis
No
Cell
Visuals
454 GS FLX
http://454.com/products/technology.asp http://bcove.me/7eidiq1e?width=490&he ight=274
HiSeq
http://www.youtube.com/w atch?v=77r5p8IBwJk
Ion PGM
http://www.youtube.com/watch?v=yVf 2295JqUg&feature=plcp&context=C489 7380VDvjVQa1PpcFPcv91xP1YGJ31VyENe915toprCBsg2Jc%3D
PacBio RS
http://www.youtube.com/watch?v=N HCJ8PtYCFc&feature=related
Visuals
454 GS FLX
HiSeq
Ion PGM
PacBio RS
Instrument comparison
Platform Nb reads per unit 1 million per plate Read length 350 500 bp 50 bp 100 bp 150 bp 35 400 bp Run time Cost $ per Mb * Preferred uses Type of errors GS - FLX+ (454) 20 h Library prep: 160 $ Per plate: 8 200 $ Library prep: 160 $ Per lane: 715 $ to 2 100 $ (length) 7$
Amplicon sequencing; Initial characterization. non-model species.

Re-sequencing; Frequency-based applications. NOT amplicons Individual laboratories, Small scale Non-model species, long fragments, methylated fragments
Indels
HiSeq (Illumina)
~200 million per lane
8 days
0.1 $
Susbstitutions
Ion PGM 314 ; 316 or 318 chip
314: 100 000 316: 1 million 318: 10 millions

50 000 reads per cell
1.5 7 h
50 $
Indels
PacBio RS
6000 bp
2h
~ 750$ USD per sample
11 200 $
CG deletions, High error rates
*Cost estimate: Glenn TC. 2011. Field guide to nextgeneration DNA sequencers. Molecular Ecology Resources 2011, 11:759769. http://onlinelibrary.wiley.com/doi/10.1111/j.1755-0998.2011.03024.x/abstract
Quantity instead of length or quality

Each read is short (75 200 bp), and bears errors: need to be confirm with many reads covering the same template region Template Long templates (gDNA) Short amplicon templates
Library preparation: gDNA fragmentation + adaptors Library preparation: Amplification + adaptors
Quantity instead of length or quality

Each read is short (75 200 bp), and bears errors: need to be confirm with many reads covering the same template region Template Long templates (gDNA) Short amplicon templates
Library preparation: gDNA fragmentation + adaptors Library preparation: Amplification + adaptors
Reads
Assembly/mapping by similiarity
8X coverage
Excluded from further analyses
8X coverage
8X coverage
Excluded from further analyses
8X coverage
8X coverage
Deduced template sequence
Useful in biodiversity?
How to make use of 200 millions reads for your biological question?
Be strategic!
Reduce the complexity of genetic material analyzed Combine different samples into a single run (Multiplexing)
Strategies: Multiplexing
Incorporate specific KNOWN oligos (code or index) at beginning of the each fragment. During library preparation Read at sequencing Sorted by sequence deconvolution according its code Roche 454: 30 (up to 130) Multiplex identifiers (MID), 10 bp Illumina: 12 Index sequences, 6 bp
A single run
Sample 1 Sample 2 Sample 3
Pool in one tube
Depth of coverage: GS-FLX plate = 250 000 reads / 25 barcodes: 10 000 reads per sample. Enough for you?
Sorted according to coded seq.
Sample 1 3
Sample 2
Sample
Strategies: Multiplexing
Incorporate specific KNOWN oligos (code or index) at beginning of the UNKNOWN fragment. During library preparation
Example of Roche 10 bp MID barcode for Amplicon sequencing
5'-CTCGTAGACTGCGTACCAATTC.............TTACTCAGGACTCAT-3 3CAATGAGTCCTGAGTAG TargetSpecific
Primer LibL_B with TargetSpecific (no MID)
Primer LibL_A with MID3 with TargetSpecific
TargetSpecific GACTGCGTACCAATTC3 3 - CATCTGACGCATGGTTAAG .............AATGAGTCCTGAGTAGCAG-5
Strategies: complexity reduction

A few organisms (1 to a few hundreds) : Survey a few thousands loci per sample Enrich in gene-rich regions for gDNA sequencing Random genomic survey Transcriptome sequencing Very many organisms (e.g. environmental studies): Survey one or two loci per individual Amplicon sequencing with universal primers (PCR)
By hybridization
Be creative!
A few organisms:
Enrich in simple-sequence-repeats. Hybridization to target repeats (e.g. microsatellites loci)

Enrich in gene-rich regions for genomic DNA sequencing. Hybridization to reference set of genes (e.g. target exons)
Bound (retained) Bait (custom made)
Unbound (discarded)
Beads
DNA fragmentation
Hybridization: Enrich in specific fragments (e.g. exon)
Sequence the enriched pool
! Evaluate costs carefully

From 2008 to 2013? Instruments give higher throughput Each sequencing run is cheaper May be cheaper not to target specific regions
By methylation-sensitive RE
Be creative!
A few organisms: Enrich in gene-rich regions for genomic DNA sequencing Elimination of methylation rich regions (plants repetitive elements)
Nuclear DNA fragmentation
Insert in E. coli : digests methylated DNA
Sequence the enriched pool
By amplification
One or a few organisms: Randomly sample the whole genome Amplification: AFLP-like Sequence instead of length polymorphism ddRAD : Double digest restriction-site-associated DNA sequencing, to find SNPs
DNA fragmentation
Enz.A Enz.A Enz.B
Adaptor ligation
Amplification with adaptor primers
By amplification
One or a few organisms: Randomly sample the whole genome ddRAD : Double digest restriction-site-associated DNA sequencing Powerful: Coupled with multiplexing
Enz.A Enz.A Enz.B
Index
Adaptor
Sample 1 Multiplex Sample 2
Genome complexity reduction: RNA

A few samples:
Transcription (DNA > RNA) Translation (RNA > protein)
Transcriptome sequencing
Total RNA : RNAseq Reduce to mRNA only (polyA)
Reduce to microRNA only

! Driven by external condition and by tissues type Needs high number of reads: Illumina preferred
Genome complexity reduction: RNA

A few organisms:
Reminder: mRNA sequences include non-coding regions (UTR)
5 UTR
Exon
Intron
Exon
3 UTR
AAAAAAA
5 UTR
CDS
3 UTR
Genome complexity reduction: Amplicon

Very many organisms:
Amplicon sequencing with universal primers for ONE loci Limitation: primers may not amplify equally well in ALL target organisms Environmental samples targeting ITS, 16S, CO1 (the barcode loci)
Primers anneal
Primers anneal
Primers DO NOT anneal
Case studies in biodiversity

Bartram et al 2011. Generation of multi-million 16S rRNA gene libraries from complex microbial communities by assembling paired-end Illumina reads. Appl. Environ. Microbiol.
http://aem.asm.org/content/early/2011/04/01/AEM.02772-10
Castoe et al. 2011 Rapid Microsatellite Identification from Illumina Paired-End Genomic Sequencing in Two Birds and a Snake. PlosOne http://www.plosone.org/article/info:doi/10.1371/journal.pone.0030953 Peterson et al. 2012. Double Digest RADseq: An Inexpensive Method for De Novo SNP Discovery and Genotyping in Model and Non-Model Species. http://www.plosone.org/article/info:doi/10.1371/journal.pone.0037135 Griffin et al. 2011. A next-generation sequencing method for overcoming the multiple gene copy problem in polyploid phylogenetics, applied to Poa grasses. BMC Biology 9: 19.
http://www.biomedcentral.com/1741-7007/9/19
Bacterial communities
Bartram et al: Generation of multi-million 16S rRNA gene libraries from complex microbial communities by assembling pairedend Illumina reads. Appl. Environ. Microbiol.
http://aem.asm.org/content/early/2011/04/01/AEM.02772-10
Objective: develop a protocol for community genetic diversity (test for two samples, costs for 20 samples)
Material:
Soils from arctic tundra. Total DNA extracted with FastDNA (MPBiomedicals) Also includes a control bacterial mix in liquid media.
20 X 5 $ = 100 $
Molecular steps
Primer for hypervariable region 3 (V3) of the microbial 16S rRNA
81 bp, purified PAGE:
caagcagaagacggcatacgagatCGTGATgtgactggagttcagacgtgtgctcttccgatctATTACCGCGGCTGCTGG
25 X 67 $ = 1 675 $
Illumina-prime Target-gene
flow-cell-binding Index
Amplify with High fidelity polymerase (Phusion) Extract desired length 200-250 bp. (columns) Multiplexing: Yes, including technical replicates Quality control for libraries : (e.g. Agilent Bioanalyzer)
1 X 90 $ = 90 $ 25 X 1.5$ = 40 $
25 X 50$ = 1 250 $ 1 X = 2 090 $ Total = 5 250 $
Sequencing : paired-end 2 x 125bp Illumina GAIIx (would be HiSeq)
Sequence analyses
Bioinformatics: Base calling and error estimation Illumina Analysis Pipeline Quality filtering, reads sorting according to index sequence, contig assembly (custom made, PANDAseq).
Index seq.
Raw reads
Custom program
Paired-end reads assembled

CD-HIT
Discard:
1 or more mismatch between the two overlapping fragments of a the pair-end 1 or more ambiguous base
Cluster modified single linkage

RDP / QIIME
Assignation to taxonomic affiliations : nave Bayesian classification (Ribosomal Database Project RDP classifier) cutoff 0.5. Goods coverage for each libraries to estimate sequence coverage (C = 1 n1/N) CD-HIT to cluster arctic tundra datasets at 97% sequence identity
Classification / Diversity estimate
Results
Total of 12 million raw reads Discard 50% of the reads:
Raw reads: 7.6 million and 4.4 millions for each technical replicates Post-assembly: 4.1 and 2.4 millions for each technical replicates
Average post-assembly contig : 150 11 bases (without primers). Overlap 66 11 bases Pre-clustering at 97% sequence identity Estimate error rate (from control library): 1 error per 5 contig (1%/base). Higher than Sanger sequencing. Find contaminant in the growth media of a control Duplicate arctic tundra libraries displayed a high degree of similarity
Comparison of phyla in one library compared to one another (AT1 to AT2; r=0.999) The majority of sequences clusters (99.57%) detected in both replicates
Isolation of novel microsatellites loci

Castoe et al. 2011 Rapid Microsatellite Identification from Illumina Paired-End Genomic Sequencing in Two Birds and a Snake. PlosOne http://www.plosone.org/article/info:doi/10.1371/journal.pone.0030953
Objective: Discover novel microsatellites loci from diverse organisms Material: Total genomic DNA from many different organisms (report results for only 3 species, I calculate for 8)
Also read: Jennings et al. 2011. Multiplexed Microsatellite Recovery Using Massively Parallel Sequencing. Molecular Ecology Resources. http://doi.wiley.com/10.1111/j.17550998.2011.03033.x.

Genome complexity reduction: None: direct sequencing Material: Genomic DNA (5 ug). One individual per species Library preparation for Illumina sequencing (likely on ~8 to 10 species) Multiplexing: Yes. At the sequencing facility, during library preparation. Sequencing platform: Illumina GAIIx ; 120 bp paired-end. Would now be HiSeq2000 One need to order primers for each loci after that ~ 700$ for 50 loci up to 5 500$ for 8
8 X 5 $ = 40 $ 8 X 160 $ = 1 280 $
1 X 2 090 $ = 2 090 $ Total: 3 400 $ Total (with primers): 9 000$

Bioinformatics: Simple, no assembly, no comparison to reference genome
In a perl script Identify reads that contain perfects SSR : 2mer to 6mer, repeated at least 6 times Sort by SSR types (de-multiplex) Design primers (with Primer3) Discard the primer pairs that also occur in other reads

Results:
Number of raw reads: Not reported Use 5 millions paired-end reads per sample (A 1X coverage) Mean sequence length : Not reported Between 150 000 to 540 000 potential loci (containing microsatellites) Primers designed for 72 000 to 174 000 loci, depending on species With extra stringency (only 3 to 6-mer, >7 repeats): 200 to 2000 loci
Primers not tested for amplifyability Conclusions: Large variation in number and proportion of motifs (3-mer, 4-mer) in the different organisms.
ddRAD-seq
Peterson et al. 2012 Double Digest RADseq: An Inexpensive Method for De Novo SNP Discovery and Genotyping in Model and Non-Model Species. PlosOne http://www.plosone.org/article/info:doi/10.1371/journal.pone.0037135 Objective: Aim at 10 000 SNPs, random genome-wide,10X coverage
Material: Total DNA extracted from 54 P. leucopus , one population (Qiagen kits) 54 X 5 $ = 270 $
ddRAD-seq
Complexity reduction: Yes, complex Digestion, annealing, size-selection, many purifications steps Multiplexing: Yes 54 samples. With simulation, include genome size and nucleotides frequency, estimate they need 400 000 reads per individual, for the 300 +- 30 bp
Platform: Two lanes of GAII (now HiSeq 2000). Paired-end
2 X 2 010 $ = 4 020 $
ddRAD-seq
PCRprimer1 (46bp) AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACG adaptorP1 gDNA adaptorP2 ACACTCTTTCCCTACACGACGCTCTTCCGATCTAATTA-3 AATTCNNNNNN P-5-CGAGATCGGAAGAGCGAGAACAA Oligo1.1 |||||||||||||||||||||||||||||||||||||| |||||||| ||||||||||||| Oligo2.1 Oligo1.2 one of 48) TGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGATTAATTTAA-5-P GNNNNNNGGC TCTAGCCTTCTCGTGTGCAGACTTGAGGTCAGTG CGTGTGCAGACTTGAGGTCAGTGTAGTGCTAGAGCATACGGCAGAAGACGAAC
Oligo2.2 PCRprimer2 (1 of 12)
Many oligos, combine index on the 5 and on the 3 Digestions and PCR amplifications
110 X ~30$ = 3 300 $ Enzymes + purif. = 1 250 $ Big total = ~9 000 $
Many purifications and precise size selection (pippin prep)
ddRAD-seq Sequence analyses

Initial sequence processing De-multiplex accept 1 bp mismatch in the 4 bp barcode Assign the read to a single individual Collapse identical reads to one seq., retaining fequency
No reference genome; not the Stacks package Compute pariwise distance btw alll reads (BLAT) MCL to group similar reads (ortholog inference) Count unique seqs in a cluster (=loci), count how many are beyond the ploidy level )=% error containg reads) Align orthologs (MUSCLE) Write alignment as reference-ordered SAM/BAM files GATK UnifiedGenotyper, to genotype Error : rate ranged from 0.18 0.22% per nucleotide. 1/10 reads Technical replicates? No
ddRAD-seq results
The 54 wild Peromyscus from a same population Total reads: not reported (~2 X 21 millions) Assigned to an individual: not reported Discard 5.4% of reads SNP discovered Variable regions (loci): 6 200 found Polymorphic sites for >70% of individuals: 16 000 sites found In an analysis on samples from different populations SNPs in multi-SNPs loci: >80% These multi-SNP are usually excluded in other analyses
Phylogenies with polyploids

Griffin et al. 2011. A next-generation sequencing method for overcoming the multiple gene copy problem in polyploid phylogenetics, applied to Poa grasses. BMC Biology 9: 19. http://www.biomedcentral.com/1741-7007/9/19 Objective: Phylogenies in polyploid grasses ; recent, rapid radiation (in a time and cost effective experimental design)

Material: Total DNA extracted from 60 individuals of 11 different polyploids Poa species Complexity reduction:
60 X 5 $ = 300 $
Primers Amplify 3 cp genes (rpl32-trnL, rpoB-trnC and trnH-psbA) and two nuclear genes (DMC1 and 10 X 5 $ = 50 $ Enzymes CDO504) from each of the 60 samples. 1 X 150 $ = 150 $ Target amplicons are < 500 bp
Pool the 5 different PCR-products for one individual

Multiplexing: Yes, addition of ds-adaptors with MID barcodes by ligation. Design 64 different 64 X 2 X 9 $ = 1160 $ barcodes (with 3 technical replicates)
64 X 2 X 6 $ = 800 $
TitaniumAdapterA 25bp + MIDbarcode + T
CGTATCGCCTCCCTCGCGCCATCAG + ACGAGTGCGT + T GCATAGCGGAGGGAGCGCGGTAGTA TGCTCACGCA
A - TitaniumAdapterB
A + CTGAGCGGGCTGGCAAGGCGCATAG GACTCGCCCGACCGTTCCGCGTATC
Ligation of barcode-adapters to the pools of amplicon, purification, pool, quality control
1 X 100 $ = 100 $ 64 X 1.5 $ = 96 $ 1 X 50 $ = 50 $ 1 X 2 140 $ = 2 140 $ Total = ~4 900$
Sequencing platform: plate of a Roche 454 with Titanium 2010 chemistry

Bioinformatics: Galaxy platform (free) Sort the gene regions by regular expression (REGEX) of gene specific primers Discard:
low-quality reads short sequences reads matching no MID barcode
Calculate error rate by calculating SNP at chloroplast regions. Detect and discard PCR recombinant
Alleles that occurred at <5% for a species OR Both ends of the allele do not match the same common allele

Results: 121 000 raw reads. Length: 40 to 775 bp (mean 278 bp).
111 200 (92%) match to gene specific primer 70 601 reads (58%) remained after barcode sorting and quality control Useful sequence for 281 out of 320 (88%) targets = 12% missing
Sequence error rate 0.13%

PCR recombination : 2.9% of CDO504 reads and 14% of DMC1 reads Technical replicates: P. costiniana: identical alleles (At the < 2 bp level), but one extra allele (=PCR error). Two distinct copies (and more) of each nuclear gene deduced
DMC1 has 19 (4.0%) base difference and CDO504, 35 (8.5%) and seven-bp indels and 4-bp-indels One extra gene copy discovered for CDO514, shows a 57-bp deletion in intron.

Number of sequence reads obtained for each marker/individual combination.
A - After quality control and barcode deconvolution. B - Useful sequence reads remaining after alignment and editing.
Percentage of useful reads gained for each nuclear gene copy and allele, including recombinant reads.

Results: Phylogenetic analyses Timing of polyploidization: took place before the Australian and the American species diverged. Extensive haplotype sharing between taxa currently different species Nuclear gene networks showed incongruence both with each other and with the chloroplast gene networks Tasmania-mainland differentiation detected On the local scale, strong spatial genetic structure detected using two of the chloroplast markers.
Suggest a smaller neighborhood for seed dispersal than for pollen dispersal.
To remember
Diversity of protocols and experimental design (Be creative!) Budget: 3 000 $ to 9 000 $, main cost can be primers and library preparation Standards are rapidly increasing:
Technical replicates required
Challenge: Data analysis

No standard analytical protocol (custom, in house, developed) No standard calculation of error rate Initial steps computer intensive (30 millions of short reads)
Results:
Half of the reads are discarded Many target loci will be missing Unequal proportion of technical replicates in final dataset Prone to PCR recombination and chimeras assembly
Comparison
Platform Nb samples Total cost % reads retained Nb clean unique reads Tech. reps Error rate Missed target
Arctic soil
GA II, one lane
24
5 250 $
53 %
4.1 million vs 2.4 millions for tech rep
Yes
1%
NA
Microsat
GAIIx, one lane
? (? 8)
3 400 $ (without primers)
No
NA
SNP (ddRAD)
GA II, two lanes
54
9 000 $
95%
7 000 loci w SNPs
No
0.18 0.22% per base
Polyploids
GS FLX, plate
61
4 900 $
58 %
70 601
Yes
0.13%
12 %
Thank you!

Massively Parallel Sequencing For Biodiversity

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Massively Parallel Sequencing For Biodiversity

Hochgeladen von

Copyright:

Verfügbare Formate

Massively parallel sequencing for biodiversity science

Annie Archambault, Centre de la Science de la Biodiversit du Qubec

Parallelize the sequencing process :

Outline Uses for biodiversity studies

Review of technologies and applications in biodiversity :

Pyrosequencing emulsion PCR

Amplicon sequencing; Initial characterization. non-model species.

~200 million per lane

Ion PGM 314 ; 316 or 318 chip

314: 100 000 316: 1 million 318: 10 millions

~ 750$ USD per sample

CG deletions, High error rates

Quantity instead of length or quality

Quantity instead of length or quality

Excluded from further analyses

Excluded from further analyses

Deduced template sequence

Pool in one tube

Sorted according to coded seq.

Primer LibL_B with TargetSpecific (no MID)

Primer LibL_A with MID3 with TargetSpecific

TargetSpecific GACTGCGTACCAATTC3 3 - CATCTGACGCATGGTTAAG .............AATGAGTCCTGAGTAGCAG-5

Strategies: complexity reduction

Enrich in simple-sequence-repeats. Hybridization to target repeats (e.g. microsatellites loci)

Hybridization: Enrich in specific fragments (e.g. exon)

Sequence the enriched pool

! Evaluate costs carefully

Nuclear DNA fragmentation

Insert in E. coli : digests methylated DNA

Sequence the enriched pool

Amplification with adaptor primers

Sample 1 Multiplex Sample 2

Genome complexity reduction: RNA

Reduce to microRNA only

Genome complexity reduction: RNA

Reminder: mRNA sequences include non-coding regions (UTR)

Genome complexity reduction: Amplicon

Primers DO NOT anneal

Case studies in biodiversity

Sequencing : paired-end 2 x 125bp Illumina GAIIx (would be HiSeq)

Paired-end reads assembled

Cluster modified single linkage

Classification / Diversity estimate

Isolation of novel microsatellites loci

Isolation of novel microsatellites loci

1 X 2 090 $ = 2 090 $ Total: 3 400 $ Total (with primers): 9 000$

Isolation of novel microsatellites loci

Isolation of novel microsatellites loci

Platform: Two lanes of GAII (now HiSeq 2000). Paired-end

Oligo2.2 PCRprimer2 (1 of 12)

110 X ~30$ = 3 300 $ Enzymes + purif. = 1 250 $ Big total = ~9 000 $

Many purifications and precise size selection (pippin prep)

ddRAD-seq Sequence analyses

Phylogenies with polyploids

Phylogenies with polyploids

Pool the 5 different PCR-products for one individual

Phylogenies with polyploids

Ligation of barcode-adapters to the pools of amplicon, purification, pool, quality control

1 X 100 $ = 100 $ 64 X 1.5 $ = 96 $ 1 X 50 $ = 50 $ 1 X 2 140 $ = 2 140 $ Total = ~4 900$

Sequencing platform: plate of a Roche 454 with Titanium 2010 chemistry

Phylogenies with polyploids

Phylogenies with polyploids

Sequence error rate 0.13%