Beruflich Dokumente
Kultur Dokumente
Alokkumar Jha
PhD Student
Insight centre for data analytics
NUI Galway
Looks ordinary(almost)
Dont play by the rules
Have competitive
advantage
Allude detection
Flood of Data
NextGen Biology Mantra: More data is good.
Structural
Variation
Exome
Sequences
DNA Methylation
Copy Number
Alterations
Expression
Patient Sample 3
Patient Sample N
MET,ITGA2,CAV1,ASPH,LGALS3,F2RL1,SERPINE2,EGFR,CAV2
SDC4,LMNA,TPM1,DAB2,GNG12,FN1,PTPRM,MYLK,KRT18
LAMB1,ADAM9,TIMP1,ITGA3,CD44,MIR21,ITGA5,IGFBP3,NRP1
S100A6,ACTN1,ANXA2,TGFB2,THBS1,FOSL1,YAP1,TJP1,EREG,PTPRF
TIMP2,EPHA2,KRT8,SNAI2,CTTN,SERPINE1,LAMC2,IGFBP6
F2RL2,MMP2,TGFBR2,LAMA4,TIMP3,DKK1,JAG1,AXL,AREG,PTN
KRT7,LAMB3,CDH1,COL4A2,SDC1,PKP2,CLDN1,TGFA,CXCL2
ITGB4,APP,KRT19,TGFB1I1,PTGS2,LAMA3,COL4A1,EDN1,PLAU
LOXL2,PPL,CALD1,KLF5,ITGB6,MMP1,PLAT,LOX,CCND1,CTGF
TGIF1,TFPI2,TUBB6,COL1A1,CLDN7,TACSTD2,CDH2,GJA1
NID1,DSP,SPARC,CDH3,GNG11,EFNA5,IL1A,RHOB,EPCAM,F11R
MLN7243
Signature Genes
PPI based
Disease
Enrichment
PPIs
PPI databases
HPRD, BIND,
IntAct, Vidal,
MiNT, PID,
BioGrid
Graph Statistics
Number of genes from your seed list: 100
Number of intermediate components: 90
Number of interactions in subnetwork created
from seed list: 351
Total components in the background network:
2086
Total interactions in the background network:
11429
TOP Gene
COBAS2.0
BioMyndb
DAVID
DDC
Linkedmdbwor
(22 databases)
Background gene
based disease
enrichment
Background
gene from
linkedcanDB
Linkedcandb
OMIM, TTD, CTD ,
clinvar, COSMIC,
kegg, wikipathway,
reactome etc. (32
databases)
Algorithm
defined
background
genes
TOPgene,Cobas2.0,Biomy
ndb,David,Disent,Gsea
Summery: DDC
Requirements
Integrated dataset for downstream analysis
Inferred activities reflect neighbourhood of influence around a gene
Can boost signal for survival analysis and assessment of mutation
impact
LinkedSeq
(ENCODE,TCGA,SR,
GWAS,GRO-seq,
1000genome etc.)
PSM
D9
NGS(ChIP+
RNA seq
Approach)
Tissue
U133A
Cancer
Tissue data
Proteasome
Subunit
U133plus2
PSM
D9
Normal
13
13
Adipose
59
12
72
14
19
Adrenal gland
39
14
87
15
155
4693
639
3130
1099
8974
Brain
785
568
592
1627
3572
Breast
1954
251
2635
91
4931
Cervix
74
12
64
34
184
Colon
1294
206
256
27
1783
Endometrium
72
61
142
Esophagus
48
24
28
109
GIST
64
64
202
14
21
239
Blood
Microarray
Approach
Total
Cancer
Abdomen
Bladder
LinkedArray
(U133Plus2,U133)
GEO,EBI Express
Normal
Heart
41
41
Kidney
573
105
366
66
1110
Liver
182
25
156
52
415
Lung
441
225
582
364
1612
Muscle
177
331
508
Myometrium
24
24
Ovary
859
21
341
1230
Pancreas
132
55
13
208
Prostate
308
45
244
83
680
Sarcoma
493
493
Skin
290
28
499
59
876
13
22
41
268
57
46
18
389
Small intestine
Stomach
Testis
184
13
207
Thyroid
62
25
44
25
156
Tongue
11
15
Uterus
155
12
24
191
Vagina
Vulva
21
14
35
Total
13057
2655
9284
4087
29083
Probes
U133Plus2
54,613
U133A
22,215
Protein
Synonym
problem
(PSMD9=RPN4
=P27)
LinkedMDBWOR
(22 databases)
Normal
Tissue
Network
(U133plus2
N+U133A-N)
Weighted
Network with
PCC
Cancer
Network
(U133plus2
C+U133A-C)
Linked Pathways
LinkedPathway
KEGG,REACTOME
Betweenness
Eccentricity
Degree
Eigen Vector
Radiality
Shortest path Length
Longest path length
LinkedTheraputics :Results
CDD
25* forms of cancer
glioblastoma multiforme
(brain)
squamous carcinoma
(lung)
serous
cystadenocarcinoma
(ovarian)
Biospecimen Core
Resource with more
than 150 Tissue
Source Sites
6 Cancer Genomic
Characterization
Centers
3 Genome
Sequencing
Centers
7 Genome Data
Analysis Centers
Data Coordinating
Center
Clinical diagnosis
Treatment history
Histologic diagnosis
Pathologic report/images
Tissue anatomic site
Surgical history
Gene expression/RNA
sequence
Chromosomal copy
number
Loss of heterozygosity
Methylation patterns
miRNA expression
DNA sequence
RPPA (protein)
Subset for Mass Spec
Chin et al.
2014,Cell
Motivation
TCGA has many high
quality primary tumor
samples,
but metastasis kills
Which primaries will metastasize?
18
Overview of pathway-guided
approach
Integrate many data sources to gain accurate view of
how genes are functioning in pathways
Predict the functional consequences of mutations by
quantifying the effect on the surrounding pathway
Use pathway signatures to implicate mutations in novel
genes to (re-)focus targeting
Identify critical Achilles Heels in the pathways that
distinguish a particular sub-type
Sche
ma
Assembly
FASTA seq
Ensemble ID
Cell-lines
Kegg pathway
SNP Id
SGP
X
Molecular Mass
equivalent class
interaction
Peroxiredoxin 6
Chr location:start-end
COSMIC mutation
Proteomes
Cytogenetic band
Protein abundance cross
organisms
Data
GRCh38.p2
ENSG00000117592
rs761610936 dbSN
P
GO:0016021
Integral membrane
SwissPr
component
ot
X:139,955,72139,965,520
c.17G>T
Ensembl
e
MCF7,HeLa
MNLVICVLLLSIWKNN
CMTTNQTNGSSTTGD
KPVESMQTKLNY
LRRNLLILVGIIIMVFV
FICFCYLHYNCLSDDA
SKAGMVKKKGIAA
KSSKTSFSEAKTASQC
SPETQPMLSTADKSS
DSSSPERASAQS
NCBI
COSMI
C
COSM1249516
SGP
X
HGN
C
same as
hsa:347487
KEGG
UniProt
Gene
cards
PA164718516
9606.ENSP00000359571
equivalent to
CXORF
66
UniProt
PaxDb
Xq27.1
39944 Da
HPA
SPD
see also
PharmaGK
B
Modified reside
Q5JRM2
EMBL-EBI
chromosome X
open
reading frame
66 UP000005640
Glycosylation
Acknowledgements
Dr . Ratnesh Sahay
Group Leader, eHealth and Life sciences , Insight centre for data analytics @ NUI Galway
Dr . Prasanna Venkatraman
Principal Investigator, Advanced centre for treatment education and research in cancer, Mumbai, India
Dr . Rangapriya Sundarajan
Sr. Research Associate, Advanced centre for treatment education and research in cancer, Mumbai, India