Sie sind auf Seite 1von 6

Discussion

1-Barcoding references for standard markers:


DNA barcoding is the method to identify the type specie by picking a little part of that specie
to be identified from its specific genes. Basically, DNA barcoding make the comparison based
on different DNA sequences regarding to the specie’s behavior and environment. In this
procedure a unique sequence identifies the unique specie, like as use of barcode in supper
markets to identify the references item against other items using a black strip of barcode from
its appointed database. Sometime, unknown species or an unknown pert of an organism are
identified by using barcode and indicates the boundaries of a specie. Through barcoding
different groups of organisms are identified by using their different gene areas/region. The
main behind to choosing the region of genes because they are less intraspecific variation than
Interspecific variation. DNA barcoding used in different applications to identify even plant
leaves, insects, identify diet etc. metabarcoding is used to identify different organism from the
sample which contain the sample of different DNA different organism.
There is need of bioinformatics analysis to get the well-structured, transparent and accurate
data. Some of sequences provide better results when they are analyzed like MISEQ and Paired-
End Sequences, the sequence carried out from the both directions.
2-Methods of Genome Skimming
Genome skimming is one of the best and simplest method that involve un-ordered sample of
little percentage of DNA. This approach has been used successfully at different taxonomic
levels, for intraspecific ‘ultra-barcoding’, intergeneric and family-wide phylogenomic
analyses. The methodology ‘genome skimming’ was first used by Straub et al. as a way of
‘navigating the tip of the genomic iceberg’; that is, shallow sequencing of DNA that results in
differentially deep sequencing of the high-copy fraction of the genome ( mitogenome, and
repetitive elements). This predictable expanded method of Genome skimming is for customize
phylogenetic markers used in plants, particularly in the various plastid markers. The quantity
if plastid DNA that is present in the gDNA is totally depend upon the species-species factors,
developmental stage and tissues that are sampled. There is one thing that matter in plant is
genome size. The ratio of DNA sample decreases as the genome size decreases. There are some
other methods also of Genome Skimming:

• Specimen sampling
• DNA extraction
• Library preparation
• Library pooling
These practical and routine application of genome skimming for recovering sequences from
plastid genomes and DNA from small quantity of starting tissue from preserved herbarium
specimens. The ongoing development of new sequencing technologies is creating a
fundamental shift in the ease of recovery of nucleotide sequences enabling ‘new uses’ for the
hundreds of millions of existing herbarium specimens [1, 10, 14, 16, 29].
3-How Variable are the Species:
All the result of species have same results in their all properties including (Extended genome
size, Total size of read, overall k-mer frequency, Max number of distinct frequency) except of
GC Content. All the species has different percentage of their GC Content.
Here is a table if GC Content for different species.

Specie Pinan Pinanga- Pinanga- Pinanga- Pinanga- Pinanga- Pinan


Name: ga bullatelea pinangmu riverside_entire spicate_sigmoidl subterra ga-
ves ring orbifid eaflets nea tenell
a
GC 48.07 51.56% 49.62% 50.07% 50.21% 51.14% 47.87
Conte % %
nt:

4-Best Method of Making K-mer:

The DNA Sequences that are arranged needs highlighted K-mers, that’s why the K-mers are
putted in capital form on the assembled DNA sequences. The K-mers are mostly used to
improve the expression of heterologous gene and used to compare the difference between
different species in metagenomic samples. K-mers are used to create the vaccines that contain
a pathogenic agent that has been altered to be greatly reduced virulence. Usually the K-mers
are eligible to provide all types of sequence and their sub-sequences of length K, such as some
sequences have four monomers (A, G,T, C) , some have three 2-mers (AD, GA, AT) two 3-
mers (AGA and GAT) and some have one 4-mers (AGAT). Generally, these set of k-mers are
used to generate De Bruijn graphs. During the creation of these graph, the k-mers set store the
value of length L in every edge and must overlap to the other string to make the vertex. The
reads generated has the different length value than the read to be generated derive from the
next-generation sequence. In De Bruijn Graphs the assumed key is violated by the small
fraction, that all the k-mers reads must overlap its adjoining k-mer in the genome by k-1. This
problem is only sorted out when all the k-mers reads are resized and converted into smaller
size that help to solve the problems of different initial read length. When the chances of
overlapping are increases there is desired sequence generated by which De Bruijn Graphs are
constructed. There is also a risk of lower size k-mers in such a way hat there is leading of
vertices occurs and resulted the reconstruction of genomes which is very difficult, and
information could be lost if the k-mers are smaller.
5-Genome Skimming Data and Target Capture Data

Genome Skimming Data Target Capture Data

Traditional (meta)barcoding, employed to Target capture sequencing is one of three


study taxonomic biodiversity. types of Next-Generation sequencing

Based on DNA sequencing of taxonomically ). Targeted sequencing focuses on specific


informative and group-specific marker regions of the genome based on relatively
genes. few specific genes

But have flanking regions that allow for PCR This sequencing provides an effective and
amplification using universal primers. sensitive means for sequencing specific
genomic regions in a high-throughput
manner.

The phylogenetic signal and identification Specific group of genes, coding


resolution of barcode markers can be limited regions(exons) or non-coding
regions(introns),

They are relatively short regions. Based on the nature of their core reaction
principle.

Genome skimming offers a comparatively target capture can be categorized into 3


straightforward mechanism that improves categories: hybrid capture, selective
and extends DNA barcodes circularisation, and PCR amplification
This method can amplify even the smallest of
In plants, genome skimming recovers all the DNAs making it a very useful tool.
different ‘standard’ barcoding regions
Difference Between Good DNA and Degraded DNA

Good DNA Degraded DNA


• Physical Properties. In living organisms
such as humans, DNA exists as a pair of DNA is one of the most essential molecules
molecules rather than a single molecule. in organisms, containing all the information
necessary for organisms to live

Base Pairing. It replicates and provides a mechanism for


heredity and evolution.

DNA Grooves Various events cause


the degradation of DNA into nucleotides.

DNA Supercoiling. The random catabolism


of DNA accompanying the irreversible
damage to tissue which leads to the
pathological death of one or more cells.

DNA Conformations Any biological application is impacted if the


DNA is degraded

DNA Sense and Antisense

DNA stability

6-Building of K-mer for Target Capture and Genome Skimming

For Genome Skimming

K-mer counting is the initial stage in algorithm-related bioinformatics. It is a very simple and
effective means to study the nature of repetition of subsequences in genomic sequence. This is
exemplified by the failings of the method to distinguish 896 out of 4,174 wasp species from
each other using COI barcodes (Quicke et al., 2012). These drawbacks have led to an alternate
method that uses low-pass sequencing to generate genome skims and then identifies
chloroplast/mitochondrial marker genes or assembles the organelle genome. Genome
skimming offers a comparatively straightforward mechanism that improves and extends DNA
barcodes (Dodsworth 2015). In plants, genome skimming recovers all the different ‘standard’
barcoding regions in the plant and simultaneously provides sequence data from many other loci
(Besnard et al 2014), while also providing a direct connection with all other phylogenetically
informative genomic regions.K-mer profiling facilitates the simultaneous discovery of single
nucleotide variations, insertions and deletions associated with the phenotypes under study.
Using k-mer-based methods in genomic biomarker discovery does not require sequence
aligning, mapping or assembling and it can even be applied to raw sequencing data (Raime &
Remm, 2018).

For Target Capture

Targeted capture provides an efficient and sensitive means for sequencing specific genomic
regions in a high-throughput manner. To date, this method has mostly been used to capture
exons from the genome (the exome) using short insert libraries and short-read sequencing
technology, enabling the identification of genetic variants or new members of large gene
families. Sequencing larger molecules results in the capture of whole genes, including intronic
and intergenic sequences that are typically more polymorphic and allow the resolution of the
gene structure of homologous genes, which are often clustered together on the chromosome.
Here, we describe an improved method for the capture and single-molecule sequencing of DNA
molecules as large as 7 kb by means of size selection and optimized PCR conditions.

7-Hypothesis to Explain the Tests:


The best way to determine whether a arithmetic hypothesis is true would be to examine the
entire population. Since that is often impractical, researchers typically examine a random
sample from the population. If sample data are not consistent with the statistical hypothesis.
There are two types of statistical hypotheses.

• Null hypothesis. The null hypothesis, denoted by Ho, is usually the hypothesis
that sample observations result purely from chance.
• Alternative hypothesis. The alternative hypothesis, denoted by H1 or Ha, is the
hypothesis that sample observations are influenced by some non-random cause.

Statisticians follow a formal process to determine whether to reject a null hypothesis, based
on sample data. This process, called hypothesis testing, consists of four steps.

• State the hypotheses. This involves stating the null and alternative hypotheses.
The hypotheses are stated in such a way that they are mutually exclusive. That is,
if one is true, the other must be false.
• Formulate an analysis plan. The analysis plan describes how to use sample data to
evaluate the null hypothesis. The evaluation often focuses around a single test
statistic.
• Analyse sample data. Find the value of the test statistic (mean score, proportion, t
statistic, z-score, etc.) described in the analysis plan.
• Interpret results. Apply the decision rule described in the analysis plan. If the
value of the test statistic is unlikely, based on the null hypothesis, reject the null
hypothesis.
What Is T Test?

• It is a parametric test which tells you how significant the differences between
groups are; In other words, it lets you know if those differences (measured in
means/averages) could have happened by chance.
• T-tests are called so, because the test results are all based on t-values.
• A t-test looks at the t-statistic, the t-distribution values, and the degrees of freedom
to determine the probability of difference between two sets of data

T-Test Assumptions:

• The first assumption is concerned with the scale of measurement. Here assumption for
a t-test is that the scale of measurement applied to the data collected follows a
continuous or ordinal scale.

• The second assumption is regarding simple random sample. The Assumption is that
the data is collected from a representative, randomly selected portion of the total
population.

• The third assumption is the data, when plotted, results in a normal distribution, bell-
shaped distribution curve.

• The fourth assumption is a that reasonably large sample size is used for the test. Larger
sample size means the distribution of results should approach a normal bell-shaped
curve.

• The final assumption is the homogeneity of variance. Homogeneous, or equal, variance


exists when the standard deviations of samples are approximately equal.

T-test Algorithms

Das könnte Ihnen auch gefallen