Sie sind auf Seite 1von 27

talks

Filtering soma.c variants

Rescuing TiN variants


and removing ar.facts
Soma.c Variant Discovery Workow

Indels coming
soon! (M2)

+ some post-processing
to rescue TiN variants
and eliminate ar<facts
Reminder: MuTect itself applies several internal lters

detection of a somatic
Tect. MuTect takes as Variant filters (site-based)
quencing data from Proximal gap Tumor Normal
Panel of normal
es and, after removing samples
ementary Methods),
Strand
e is evidence for a variant
b ias Read filters
Proximal gap Strand bias
dom sequencing errors.
re then passed Poor
six mapping

STD callset
through

HC callset
L[Mfm]P(m,f) ?
log10 log10 T
s (Table 1). Next, a panel
) filter is used to Triallelic site
screen L[M0](1P(m,f))
Poor mapping Triallelic site
Variant detection statistic
ives caused by rare error
Clustered posi.on
n additional samples.
T

erm-line status of passing N

Observed in normal
sing the matched normal
HC, high confidence.
Clustered
position
Observed
in control

No VQSR yet! mutations using germ-line events, which differ f


essing mutation callers tions in their nucleotide substitution frequenci
tion methods have been developed, but there are recalibrated base qualities vary for the different ba
aches for benchmarking their performance on in machine errors), there is variable sensitivity in
PART 1:
RESCUING TIN VARIANTS
Where do Tumor in Normal variants come from?

Liquid tumors
Blood-borne cancer
(e.g. leukemia)

Tissue-adjacent normal
Esp. tumors that are spread thin (unlike clearly separated,
below)
Prevalence of TiN variants by tumor type

Breast cancer *1 AML^2 CLL Head & Neck *3

47 6
155
59 77
254
298 49

Greater than 2% tumor in normal Less than 2% tumor in normal * Tissue adjacent normal
^ Skin used as normal
1
Tumor in Normal

0.5

0
BRCA LAML CLL HNSC
1. Comprehensive molecular portraits of human breast tumors. Nature. 490 (7418):61-70
2. Genomic and epigenomic landscapes of adult de novo acute myeloid leukemia. NEJM. 368:2059-2074
3. Comprehensive Genomic Characteriza.on of Head and Neck Squamous Cell Carcinomas Nature. 517 (7536):576-582
MuTect evaluates AF of tumor vs. normal

Samples Sequencing Reads Somatic Mutation Calling Resulting AF fit


Pure Normal
Pure Normal

Kept
Detected

Normal AF
Rejected
Undetected

Tumor
Matched Tumor MuTect
Tumor AF
x
x
x
xx
Detected
Undetected

Normal AF
Contaminated Normal

x Tumor AF
x
AF evalua.on leads to rejec.on if Normal is contaminated

Samples Sequencing Reads Somatic Mutation Calling Resulting AF fit


Pure Normal
Pure Normal

Kept
Detected

Normal AF
Rejected
Undetected

Tumor
Matched Tumor MuTect
Tumor AF
x
x
x
xx
Kept
Detected
Rejected
Undetected

Normal AF
MuTect
Contaminated Normal
Contaminated Normal

x Tumor AF
x

Many muta.ons are detected in the tumor


but rejected based on matched normal
We can use AF t to rescue TiN variants

Hets in Normal with severe allele imbalance


Recovered Muta.ons
Default Kept muta.ons

Allele imbalance Het TiN t

Muta<on TiN t

Recover muta.ons if at least 100x more likely to be soma.c than germline given es.mated TiN

Example: A breast cancer and .ssue-adjacent normal Es.mated TiN ~ 25%


deTiN recovered 88 addi.onal muta.ons (w/o deTiN only 15 muta<ons)
Without TiN rescue, sensi.vity decreases rapidly with AF

0.9

0.8

0.7
Sensitivity

0.6

0.5

0.4

0.3
deTiN af>Q3
No deTiN af>Q3
0.2 deTiN all
No deTiN
deTiN all
af>Q3
0.1 No deTiN af>Q3 Muta.on
deTiN all allele frac.on
No deTiN all
0.2 in top quar.le
0
0.005 0.01 0.02 0.05 0.07 0.1
Tumor in Normal
With TiN rescue, 90% sensi.vity is recovered

0.9

0.8

0.7
Sensitivity

0.6

0.5

0.4
deTiN af>Q3
0.3 No deTiN af>Q3
deTiN af>Q3
deTiN all
No deTiN af>Q3
No deTiN all
deTiN
0.2 deTiN af>Q3
all
No deTiN
deTiN af>Q3
af>Q3
No deTiN all
0.1 deTiN all
No deTiN af>Q3
No deTiN
deTiN all all
0 No deTiN all
0.005 0.01 0.02 0.05 0.07 0.1 0.2
Tumor in Normal
How does this impact recovery of driver muta.ons?

Muta.ons in CLL cancer genes across 82 CLLs with TiN>2%


MuTecT

samples
TiN rescue recovers ~40% more puta.ve driver muta.ons

Muta.ons in CLL cancer genes across 82 CLLs with TiN>2%


MuTecT
MuTecT + deTiN

samples
Vaida.on experiments show recovery of expected distribu.ons

p=.0007
6

4
Driver per Sample

pre-DeTiN post-DeTiN MRD-Samples

A[er deTiN muta<on recovery, contaminated and uncontaminated samples had the
same distribu<on of driver muta<ons per sample
PART 2:
REMOVING ARTIFACTS
Types of FP ar.facts

OxoG oxida.on (due to shearing)


FFPE oxida.on (sample preserva.on technique)
Other context biases

All detectable based on strand orienta.on biases


Manifesta.ons in muta.on signature (Lego) plots
Standard Aging aMging
Standard uta.on Signature
muta.on signature (Lego plots)
(TCGA Ovarian Cancer)
muta.on type
Ovarian cancer C T
C A
C G
A G
A T
A C

muta.on rate
(per million sites)
3-base context histogram in 96 bins (reverse complement combined)
Manifesta.on of ar.facts in muta.on signatures (Lego plots)

Mutations from 2 exomes Accumulated mutations from 263 tumor exomes:


dominated by interesting
signature

typical C>T
aging signature

interesting
A>T signature
Example of site where a T>A ar.fact appears to be a real variant

Example T>A
mutation:
splice-site
CHD8
Tumor 7 reads support T>A
splice-site mutation in
7 reads from CHD8

tumor
0 normal
consistent
with somatic
Normal
mutation at no support for T>A
mutation from normal
15% allele sample
fraction

Appears to be clean candidate mutation of possible functional significance


Coloring reads by strand orienta.on in IGV reveals the ar.fact

Mutation is
entirely
supported by
reads from one
Tumor All 7 reads are on F1R2
strand (F1R2 strand (red). p~0.008
marked in red,
F2R1 marked in
blue)

Normal
Use strand orienta.on to lter out the likely ar.facts

T>A & A>T mutations in problematic sample non-T>A / A>T mutations (null model)
alternate allele count

alternate allele count


for T>A & A>T

for non-T>A
Filter cut
line for
FDR < 1%

F_orientation_bias F_orientation_bias
OxoG oxida.on causes G>T ar.facts and C>A ar.facts

1) High-powered shearing induces


5 g g 3 top
G > 8-oxo-G (g) defect on single DNA 3 g g 5 bottom
strands (g:A instead of G:C)

2) Ligation of Illumina adapters 5 g g 3


3 g g 5

top bottom

3) PCR denaturation
5 g g 3 5 g g 3

4) PCR incorporates Ts to complement As


read 1 read 1

T T T T
5 3 5 3
A A A A

read 2
read 2

5) Illumina short read pair sequencing :


Read 1 always has T artifact; read 2 always has complement A
Observed mapped as G>T artifacts on F1R2 and C>A artifacts on F2R1 pairs in
standard Illumina short read pair sequencing (F forward/ R reverse mapped reads)

Costello, M., et al. (2013) Nucleic Acids Research, 41(6)


Evidence of strand bias caused by OxoG oxida.on

G>T missense mutation in melanoma

All 9 T supporting reads


are on F1R2 read pairs
(red) making the variant
very suspect
TUMOR

Oxidation can be reduced


by anti-oxydants in prep
F1R2 fragments
F2R1 fragments protocol

Artifacts can be filtered out


by removing strand-biased
NORMAL

Costello, M., et al. (2013) Nucleic Acids Research, 41(6)


Filtering OxoG ar.facts

G>T and C>A candidate muta.ons Other candidate muta.ons (null model)

Cut line
THCA data (n=402 tumors) Lego plots

Before OxoG filter

After OxoG filter

1:100 signal:ar.fact 100:1 signal:ar.fact

Lee Lichtenstein, Chip Stewart, Trevor Pugh, Dan-avi Landau, Tim Fennel, George Grant
Filter also works on ar.facts caused by FFPE oxida.on

689 colon cancer exomes

alternate allele count for C>T


FILTERING
BEFORE

C>T
& G>A

C>T / G>A
Artifact
mutations
F_strand_bias

alternate allele count for non C>T


mutations
FILTERING

other than
AFTER

C>T
& G>A

F_strand_bias
Soma.c Variant Discovery Workow

Indels coming
soon! (M2)

+ some post-processing
to rescue TiN variants
and eliminate ar<facts
talks

Further reading
Documenta.on coming soon to the GATK website

In the mean.me, see
hop://www.broadins.tute.org/cancer/cga/Home

Das könnte Ihnen auch gefallen