Sie sind auf Seite 1von 22

A brief introduction to

transcriptomics: from
sampling to data analysis

Leeds-omics introduc/on series

Outline
1.  Introduc/on to transcriptomes
2.  Sample collec/on
3.  RNA extrac/on methods and RNA quality
assessment and quan/fica/on
4.  RNA sequencing techniques
5.  Bioinforma/c Analyses - Typical pipeline: Quality
assessment, trimming,
6.  Special type of analyses: mapping onto genome,
quan/fica/on of expression, variant calling (SNPs)
Transcriptomes give us information of gene
expression
Iden/fy genes differen/ally expressed, iden/fy func/onal changes…

Why use transcriptomes in biological


research?

Pros Cons
•  Easy, accessible way to see •  Snapshot in /me (different
and quan/fy gene /mes, different expression
expression paTerns)
•  Absence of a gene does not
•  Immediate access to the mean it is not present in
protein coding por/on of the genome.
the genome •  Difficult to ensure that you
•  Iden/fy alterna/ve splicing have sampled a single cell
type.
•  Iden/fy Single Nucleo/de
Polymorphisms (SNPs) in •  Sta/s/cal analysis is highly
coding regions dependent on experimental
design.
The stage of gene expression we
capture
RNAseq captures the
mature messenger RNA
(mRNA)

Targets the
characteris/c poly-A
tail of the mRNA

The assump/on is
that the amount of
mRNA for any gene is
reflec/ve of its
impact on the cell
func/on

Sampling
design
VERY IMPORTANT: what is your research ques/on?
-- will you have enough to address your ques/on?

Things to bear in mind:
•  What /ssues to target – relevant to your research ques/on
•  Homogeneous sampling of /ssues - to the extent you can
manage
•  Replicates – accounts for varia/on and important to
validate results
•  Developmental stage of studied individuals
•  Consult sequencing specialists – (Ian Carr
and Steve Moss) for advice on sampling
Some techniques commonly used to
stabilise RNA
•  Snap freezing (liquid nitrogen) – immediate storage in -80°C.
•  RNAlater (Ambion) – small sized /ssue (< 0.5 cm lengths)
put in x5 volumes of it. Longterm storage: -20°C or -80°C.
•  NAP buffer (”homemade”) – similar to RNAlater.
•  Other commercial products customised to sample types (i.e.
blood)
Preserving /ssue with RNAlater

Snap freezing (liquid nitrogen)

Thermo Fisher Scien/fic

Considerations when preserving samples

•  mRNA is fragile and unstable - suscep/ble to


degrada/on – act fast.
•  Ensure asep/c condi/ons – use tubes and tools
that are RNAse-free.
•  Amount of /ssue that you need – some /ssues
have high yields (e.g. liver), and others tend to give
low yield (e.g. adipose /ssue, brain).
•  Storage – ideally at -80°C
Comparison between preserving methods
and samples
Snap frozen: best results

Followed by
RNA later
and NAP
buffer Camacho-Sanchez et al. 2013. Molecular Ecology Resources 13, 663–673

Obtaining the mRNA


IMPORTANT CONSIDERATIONS: Extraction of RNA is complicated by the
presence of ribonucleases in tissues
•  RNases are difficult to inactivate

ORGANIC EXTRACTION PROTOCOL

Bind total
RNA
Separate
Elute
phases
Tissue

Lyse and Add gDNA Add


Wash Total RNA
homogenis eliminator ethanol to
e and aqueous
chloroform phase
Other RNA extraction methods
Extrac:on method Benefits Drawbacks
Filter-based, Spin Basket Formats Convenient and easy Can become clogged with par/culates
Amenable to single-sample and 96-well gDNA and other large nucleic acids are
processing oken retained
Can be automated Automa/on requires complex vacuum
systems/centrifuga/on
Magne/c Par/cle Methods Can be automated Magne/c par/cles can be carried
through
Rapid sample collec/on/concentra/on Less efficient in viscous solu/ons
No risk of filter clogging Laborious when performed manually
Direct Lysis Methods Works well with small samples Dilu/on-based
Can be automated Spectrophotometric measurement of
yield is not possible
Scalable Possible for RNAse residual ac/vity
Poten/al for most accurate RNA Performance can be subop/mal
representa/on

RNA quality assessment and


quantification
It is important to establish both the purity and concentra/on of RNA that has been extracted

UV Spectroscopy
•  Measures absorbance of diluted RNA sample at 260 and 280 nm
•  Nucleic acid concentra/on is calculated using Beer-Lambert law

Absorbance at a A = ε C I Path length of the


par/cular spectrophotometer
wavelength cuveTe
(typically 1cm)
Ex/nc/on Concentra/on
coefficient of nucleic acid
εRNA=0.025
(mg/ml)-1cm-1
RNA quality assessment and
quantification
It is important to establish both the purity and concentra/on of RNA that has been extracted

UV Spectroscopy
•  Measures absorbance of diluted RNA sample at 260 and 280 nm
•  Nucleic acid concentra/on is calculated using Beer-Lambert law

A = ε C I
e.g. A260=1.0 is equivalent to ~40 μg/mL RNA

A260/A280 ra/o indicates RNA purity
•  1.8-2.1 indicates highly purified RNA

IMPORTANT CONSIDERATIONS:
•  pH
•  CuveTe
•  RNA dilu/on range
•  Does not discriminate between DNA and RNA (use RNase-free DNase to remove contamina/ng DNA

RNA quality assessment and quantification


It is important to establish both the purity and concentra/on of RNA that has been extracted

Agilent® 2100 Bioanalyzer


•  Combina/on of microfluidics, capillary electrophoresis and fluorescent dye
•  Evaluates both RNA concentra/on and integrity

Bioanalyzer lab chip
RIN~10
•  Nano (ng/μL) and pico (50-5000 pg/μL) systems available
•  Determines size and mass determined as RNA molecules fluoresce in chip channels
•  System produces a gel-like image and an electropherogram
•  Compares unknown concentra/ons to Agilent® RNA 6000 Ladder
RIN~6
•  RNA Integrity number determined by analysis algorithm (max value 10)
RNA Sequencing

•  Whole transcriptome shotgun sequencing (WTSS)


•  Reveals the presence and quan/ty of RNA in a biological sample at a given moment in /me

RNA ISOLATION
ISOLATED RNA

SELECTION VIA POLY(T)


MAGNETIC BEADS
RNA SELECTION/DEPLETION: cDNA SYNTHESIS

•  PolyA selec/on POLY(A) RNA


MOLECULES BIND TO
•  rRNA deple/on POLY (T) BEADS
•  RNA capture
RNA sequencing
IMPORTANT CONSIDERATIONS:
COST
• E.g. (Lui et al., 2014)
Sample type Reads needed for Reads Needed for Rare Read Length
Differen:al Expression Transcript or De Novo
•  SINGLE VS PAIRED-END READS (millions) Assembly (millions)
12 samples in one lane of Illumina HiSeq = 10 million reads per sample
•  SE: FOR EXPRESSION ANALYSIS OF WELL ANNOTATED GENOMES
Small genomes (bacteria/
•  PE: BETTER FOR CHARACTERISATION OF POOLY ANNOTATED
5 30-65 50 SE or PE for posi/onal
4 samples in one lane of Illumina HiSeq = 30 million reads per sample
TRANSCRIPTOMES
fungi) info

•  READ LENGTH
Intermediate genomes 10 70-130 50-100 SE or PE for
3X more reads per sample = 1.5X cost increase
•  (Drosophila, C. elegans)
DEPTH OF COVERAGE

posi/onal info
= ~25% more differen/ally expressed genes detected
•  Determined by number of samples (libraries) in one lane
Large genomes (human/ 15-25 100-200 >100 SE or PE for
mouse) posi/onal info
REPLICATES, RANDOMISATION AND MULTIPLEXING
• Liu, Y., Zhou, J., and White, KP., (2014) RNA-seq differen/al expression studies: more sequence or more
replica/on? Bioinforma/cs Feb 1;30(3):301-4

RAW READS DATA ANALYSIS

Bioinformatics - Analysis of
transcriptomic data
Pasteurella in Saiga Antelope host

Mass mortality hit Saiga Antelope in Spring 2015.


à Pasteurella infec:on?


4 samples of different /ssues
-  3 antelopes died from infec/on
-  1 antelope died from other cause

2 objec:ves:
1)  Get expression level of virulent Pasteurella
genes (coun/ng reads)
2)  Iden/fy other possible muta/ons (variant
calling)
Transcriptomic pipeline

Transcriptomic pipeline
NGS data – what it looks like

(.fastq, .sff, .fa, .csfasta/.qual)

Example size for sample of Saiga transcriptome: 12 Gb

Transcriptomic pipeline
Sequencing quality check
Fastq quality score: Q = -10 log10 P


Quality score Probability of incorrect Accuracy of base
iden:fica:on iden:fica:on

40 1 in 10000 99.99 %

30 1 in 1000 99.9 %

20 1 in 100 99 %

10 1 in 10 90 %

FastQC: visualisa/on
Trimmoma/c: trim reads
Cutadapt: remove adaptors FastQC interface

Sequencing quality check


Fastq quality score: Q = -10 log10 P



Quality score Probability of incorrect Accuracy of base
iden:fica:on iden:fica:on

40 1 in 10000 99.99 %

30 1 in 1000 99.9 %

20 1 in 100 99 %

10 1 in 10 90 %

FastQC: visualisa/on
Trimmoma/c: trim reads
Cutadapt: remove adaptors FastQC interface

Sequencing quality check
Fastq quality score: Q = -10 log10 P


Quality score Probability of incorrect Accuracy of base
iden:fica:on iden:fica:on

40 1 in 10000 99.99 %

30 1 in 1000 99.9 %

20 1 in 100 99 %

10 1 in 10 90 %

FastQC: visualisa/on
Trimmoma/c: trim reads
Cutadapt: remove adaptors FastQC interface

Transcriptomic pipeline
Mapping reads to Pasteurella genome

Ø Extract Pasteurella reads from samples


Ø Cases where there is no reference genome

Saiga antelope Sample Pasteurella

Reference Pasteurella (FASTA file – NCBI)

Mapping to reference genome

Common so\ware examples:



For DNA:
-  BWA (Burrow Wheeler Aligner)
-  Bow/e

For RNA:
-  Tophat
-  STAR

Output file:

BAM (Binary Alignment Map) compressed and encrypted.
SAM (Sequence Alignment Map)

Picard Tools
Samtools

Transcriptomic pipeline

Transcriptomic pipeline
SAM format and alignment statistics

SAM format

Sta:s:cs:
Samtools ‘flagstat’

Transcriptomic pipeline
Mpileup file

SAM file

Samtools ‘mpileup’

Mpileup file

Transcriptomic pipeline
Count reads mapping a region

Common so\ware: htseq-count



Compare gene expressions.
à Differen/al expression

Sample 1 Sample 2

Total: 1350 Total: 10 reads


reads mapping mapping gene A
gene A

Transcriptomic pipeline
Compare reference to ‘sample’ Pasteurella

Common so\ware: Varscan


(java)


Variant calling :
-  SNP (single polymorphism
nucleo/des)
-  Indels


IGV

FastQC
Trimmoma/c/Cutadapt

BWA, Bow5e, STAR, Tophat



Summary Samtools
Ø  Flagstat
Ø  Mpileup







HTSeq-count
Varscan
Need help?

Advice on appropriate pipeline:


Ø Ian Carr: I.M.Carr@leeds.ac.uk
Ø Stephen Moss: S.P.Moss@leeds.ac.uk

Unix command, script, so\ware parameters:


Ø Natacha Chenevoy: N.Chenevoy@leeds.ac.uk

Coming 2 day workshop in the new year:




“Introduc:on to standard transcriptome analysis”
Steve Moss
Acknowledgements
Members of the
O’Connell Lab

Members of the Creevey


Lab at Aberystywth
University

Sequencing advice:
Ian Carr
M. O’Connell Simon Goodman
Steve Moss

Das könnte Ihnen auch gefallen