A Brief Introduction To Transcriptomics From Sampling To Data Analysis

A brief introduction to
transcriptomics: from
sampling to data analysis
Leeds-omics introduc/on series
Outline
1.  Introduc/on to transcriptomes
2.  Sample collec/on
3.  RNA extrac/on methods and RNA quality
assessment and quan/fica/on
4.  RNA sequencing techniques
5.  Bioinforma/c Analyses - Typical pipeline: Quality
assessment, trimming,
6.  Special type of analyses: mapping onto genome,
quan/fica/on of expression, variant calling (SNPs)
Transcriptomes give us information of gene
expression
Iden/fy genes differen/ally expressed, iden/fy func/onal changes…
Why use transcriptomes in biological

research?
Pros Cons
•  Easy, accessible way to see •  Snapshot in /me (different
and quan/fy gene /mes, different expression
expression paTerns)
•  Absence of a gene does not
•  Immediate access to the mean it is not present in
protein coding por/on of the genome.
the genome •  Difficult to ensure that you
•  Iden/fy alterna/ve splicing have sampled a single cell
type.
•  Iden/fy Single Nucleo/de
Polymorphisms (SNPs) in •  Sta/s/cal analysis is highly
coding regions dependent on experimental
design.
The stage of gene expression we
capture
RNAseq captures the
mature messenger RNA
(mRNA)
Targets the
characteris/c poly-A
tail of the mRNA
The assump/on is
that the amount of
mRNA for any gene is
reflec/ve of its
impact on the cell
func/on
Sampling
design
VERY IMPORTANT: what is your research ques/on?
-- will you have enough to address your ques/on?

Things to bear in mind:
•  What /ssues to target – relevant to your research ques/on
•  Homogeneous sampling of /ssues - to the extent you can
manage
•  Replicates – accounts for varia/on and important to
validate results
•  Developmental stage of studied individuals
•  Consult sequencing specialists – (Ian Carr
and Steve Moss) for advice on sampling
Some techniques commonly used to
stabilise RNA
•  Snap freezing (liquid nitrogen) – immediate storage in -80°C.
•  RNAlater (Ambion) – small sized /ssue (< 0.5 cm lengths)
put in x5 volumes of it. Longterm storage: -20°C or -80°C.
•  NAP buffer (”homemade”) – similar to RNAlater.
•  Other commercial products customised to sample types (i.e.
blood)
Preserving /ssue with RNAlater
Snap freezing (liquid nitrogen)
Thermo Fisher Scien/fic
Considerations when preserving samples
•  mRNA is fragile and unstable - suscep/ble to

degrada/on – act fast.
•  Ensure asep/c condi/ons – use tubes and tools
that are RNAse-free.
•  Amount of /ssue that you need – some /ssues
have high yields (e.g. liver), and others tend to give
low yield (e.g. adipose /ssue, brain).
•  Storage – ideally at -80°C
Comparison between preserving methods
and samples
Snap frozen: best results
Followed by
RNA later
and NAP
buffer Camacho-Sanchez et al. 2013. Molecular Ecology Resources 13, 663–673
Obtaining the mRNA

IMPORTANT CONSIDERATIONS: Extraction of RNA is complicated by the
presence of ribonucleases in tissues
•  RNases are difficult to inactivate
ORGANIC EXTRACTION PROTOCOL
Bind total
RNA
Separate
Elute
phases
Tissue
Lyse and Add gDNA Add

Wash Total RNA
homogenis eliminator ethanol to
e and aqueous
chloroform phase
Other RNA extraction methods
Extrac:on method Benefits Drawbacks
Filter-based, Spin Basket Formats Convenient and easy Can become clogged with par/culates
Amenable to single-sample and 96-well gDNA and other large nucleic acids are
processing oken retained
Can be automated Automa/on requires complex vacuum
systems/centrifuga/on
Magne/c Par/cle Methods Can be automated Magne/c par/cles can be carried
through
Rapid sample collec/on/concentra/on Less efficient in viscous solu/ons
No risk of filter clogging Laborious when performed manually
Direct Lysis Methods Works well with small samples Dilu/on-based
Can be automated Spectrophotometric measurement of
yield is not possible
Scalable Possible for RNAse residual ac/vity
Poten/al for most accurate RNA Performance can be subop/mal
representa/on
RNA quality assessment and

quantification
It is important to establish both the purity and concentra/on of RNA that has been extracted
UV Spectroscopy
•  Measures absorbance of diluted RNA sample at 260 and 280 nm
•  Nucleic acid concentra/on is calculated using Beer-Lambert law
Absorbance at a A = ε C I Path length of the

par/cular spectrophotometer
wavelength cuveTe
(typically 1cm)
Ex/nc/on Concentra/on
coefficient of nucleic acid
εRNA=0.025
(mg/ml)-1cm-1
RNA quality assessment and
quantification
UV Spectroscopy
•  Measures absorbance of diluted RNA sample at 260 and 280 nm
•  Nucleic acid concentra/on is calculated using Beer-Lambert law
A = ε C I
e.g. A260=1.0 is equivalent to ~40 μg/mL RNA

A260/A280 ra/o indicates RNA purity
•  1.8-2.1 indicates highly purified RNA
IMPORTANT CONSIDERATIONS:
•  pH
•  CuveTe
•  RNA dilu/on range
•  Does not discriminate between DNA and RNA (use RNase-free DNase to remove contamina/ng DNA
RNA quality assessment and quantification

Agilent® 2100 Bioanalyzer

•  Combina/on of microfluidics, capillary electrophoresis and fluorescent dye
•  Evaluates both RNA concentra/on and integrity

Bioanalyzer lab chip
RIN~10
•  Nano (ng/μL) and pico (50-5000 pg/μL) systems available
•  Determines size and mass determined as RNA molecules fluoresce in chip channels
•  System produces a gel-like image and an electropherogram
•  Compares unknown concentra/ons to Agilent® RNA 6000 Ladder
RIN~6
•  RNA Integrity number determined by analysis algorithm (max value 10)
RNA Sequencing
•  Whole transcriptome shotgun sequencing (WTSS)

•  Reveals the presence and quan/ty of RNA in a biological sample at a given moment in /me
RNA ISOLATION
ISOLATED RNA
SELECTION VIA POLY(T)

MAGNETIC BEADS
RNA SELECTION/DEPLETION: cDNA SYNTHESIS
•  PolyA selec/on POLY(A) RNA

MOLECULES BIND TO
•  rRNA deple/on POLY (T) BEADS
•  RNA capture
RNA sequencing
IMPORTANT CONSIDERATIONS:
COST
• E.g. (Lui et al., 2014)
Sample type Reads needed for Reads Needed for Rare Read Length
Differen:al Expression Transcript or De Novo
•  SINGLE VS PAIRED-END READS (millions) Assembly (millions)
12 samples in one lane of Illumina HiSeq = 10 million reads per sample
•  SE: FOR EXPRESSION ANALYSIS OF WELL ANNOTATED GENOMES
Small genomes (bacteria/
•  PE: BETTER FOR CHARACTERISATION OF POOLY ANNOTATED
5 30-65 50 SE or PE for posi/onal
4 samples in one lane of Illumina HiSeq = 30 million reads per sample
TRANSCRIPTOMES
fungi) info
•  READ LENGTH
Intermediate genomes 10 70-130 50-100 SE or PE for
3X more reads per sample = 1.5X cost increase
•  (Drosophila, C. elegans)
DEPTH OF COVERAGE

posi/onal info
= ~25% more differen/ally expressed genes detected
•  Determined by number of samples (libraries) in one lane
Large genomes (human/ 15-25 100-200 >100 SE or PE for
mouse) posi/onal info
REPLICATES, RANDOMISATION AND MULTIPLEXING
• Liu, Y., Zhou, J., and White, KP., (2014) RNA-seq differen/al expression studies: more sequence or more
replica/on? Bioinforma/cs Feb 1;30(3):301-4
RAW READS DATA ANALYSIS
Bioinformatics - Analysis of
transcriptomic data
Pasteurella in Saiga Antelope host
Mass mortality hit Saiga Antelope in Spring 2015.

à Pasteurella infec:on?

4 samples of different /ssues
-  3 antelopes died from infec/on
-  1 antelope died from other cause

2 objec:ves:
1)  Get expression level of virulent Pasteurella
genes (coun/ng reads)
2)  Iden/fy other possible muta/ons (variant
calling)
Transcriptomic pipeline
NGS data – what it looks like
(.fastq, .sff, .fa, .csfasta/.qual)
Example size for sample of Saiga transcriptome: 12 Gb
Sequencing quality check
Fastq quality score: Q = -10 log10 P

Quality score Probability of incorrect Accuracy of base
iden:fica:on iden:fica:on
40 1 in 10000 99.99 %
30 1 in 1000 99.9 %
20 1 in 100 99 %
10 1 in 10 90 %
FastQC: visualisa/on
Trimmoma/c: trim reads
Cutadapt: remove adaptors FastQC interface



40 1 in 10000 99.99 %
30 1 in 1000 99.9 %
20 1 in 100 99 %
10 1 in 10 90 %


40 1 in 10000 99.99 %
30 1 in 1000 99.9 %
20 1 in 100 99 %
10 1 in 10 90 %

Mapping reads to Pasteurella genome
Ø Extract Pasteurella reads from samples

Ø Cases where there is no reference genome

Saiga antelope Sample Pasteurella
Reference Pasteurella (FASTA file – NCBI)
Mapping to reference genome
Common so\ware examples:

For DNA:
-  BWA (Burrow Wheeler Aligner)
-  Bow/e

For RNA:
-  Tophat
-  STAR
Output file:

BAM (Binary Alignment Map) compressed and encrypted.
SAM (Sequence Alignment Map)

Picard Tools
Samtools
SAM format and alignment statistics
SAM format
Sta:s:cs:
Samtools ‘flagstat’
Mpileup file
SAM file
Samtools ‘mpileup’
Mpileup file
Count reads mapping a region
Common so\ware: htseq-count

Compare gene expressions.
à Differen/al expression

Sample 1 Sample 2
Total: 1350 Total: 10 reads

reads mapping mapping gene A
gene A
Compare reference to ‘sample’ Pasteurella
Common so\ware: Varscan

(java)

Variant calling :
-  SNP (single polymorphism
nucleo/des)
-  Indels

IGV
FastQC
Trimmoma/c/Cutadapt

BWA, Bow5e, STAR, Tophat

Summary Samtools
Ø  Flagstat
Ø  Mpileup

HTSeq-count
Varscan
Need help?
Advice on appropriate pipeline:

Ø Ian Carr: I.M.Carr@leeds.ac.uk
Ø Stephen Moss: S.P.Moss@leeds.ac.uk
Unix command, script, so\ware parameters:

Ø Natacha Chenevoy: N.Chenevoy@leeds.ac.uk

Coming 2 day workshop in the new year:

“Introduc:on to standard transcriptome analysis”
Steve Moss
Acknowledgements
Members of the
O’Connell Lab
Members of the Creevey

Lab at Aberystywth
University
Sequencing advice:
Ian Carr
M. O’Connell Simon Goodman
Steve Moss

A Brief Introduction To Transcriptomics From Sampling To Data Analysis

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

A Brief Introduction To Transcriptomics From Sampling To Data Analysis

Hochgeladen von

Copyright:

Verfügbare Formate

A brief introduction to

Leeds-omics introduc/on series

Why use transcriptomes in biological

Snap freezing (liquid nitrogen)

Thermo Fisher Scien/ﬁc

Considerations when preserving samples

• mRNA is fragile and unstable - suscep/ble to

Obtaining the mRNA

ORGANIC EXTRACTION PROTOCOL

Lyse and Add gDNA Add

RNA quality assessment and

Absorbance at a A = ε C I Path length of the

RNA quality assessment and quantification

Agilent® 2100 Bioanalyzer

• Whole transcriptome shotgun sequencing (WTSS)

SELECTION VIA POLY(T)

• PolyA selec/on POLY(A) RNA

RAW READS DATA ANALYSIS

Mass mortality hit Saiga Antelope in Spring 2015.

(.fastq, .sﬀ, .fa, .csfasta/.qual)

Example size for sample of Saiga transcriptome: 12 Gb

Sequencing quality check

Ø Extract Pasteurella reads from samples

Reference Pasteurella (FASTA ﬁle – NCBI)

Mapping to reference genome

Common so\ware examples:

Common so\ware: htseq-count

Total: 1350 Total: 10 reads

Common so\ware: Varscan

Advice on appropriate pipeline:

Unix command, script, so\ware parameters:

Coming 2 day workshop in the new year:

Members of the Creevey

Das könnte Ihnen auch gefallen

•  mRNA is fragile and unstable - suscep/ble to

•  Whole transcriptome shotgun sequencing (WTSS)

•  PolyA selec/on POLY(A) RNA

Ø Extract Pasteurella reads from samples