Beruflich Dokumente
Kultur Dokumente
12/1/2015
Chirag J Patel
chirag@hms.harvard.edu
@chiragjp
www.chiragjpgroup.org
Complex traits are a function of genes and
environment...
P=G+E
Type 2 Diabetes
Variants Infectious agents
Cancer
Nutrients
Alzheimer’s
Pollutants
over 2000
https://www.ebi.ac.uk/gwas/
>2,000 traits/diseases
>15,000 SNPs
https://www.ebi.ac.uk/gwas/
Dissecting G in P:
What is a Genome-wide Association Study?
SNP(A)
SNP(A) SNP(a)
SNP(a)
SNP(A)
SNP(A) SNP(a)
SNP(a)
SNP(A)
SNP(A) SNP(a)
SNP(a)
diseased
diseased SNP(A) SNP(a)
diseased SNP(A)
SNP(A) SNP(a)
SNP(a)
diseased
diseased SNP(Z) SNP(z)
non-
diseased
non-
diseased
non-
diseased
diseased
non-
diseased
diseased
non-
diseased
diseased
non-
diseased
non-
diseased
non-
diseased
non-
diseased
non-
diseased
diseased
genome-wide diseased
in unrelated populations
The road to GWAS...
A new paradigm of GWAS for discovery of G in P:
Human Genome Project to GWAS
Sequencing of the genome Characterize common variation Measurement tools
HapMap project:
http://hapmap.ncbi.nlm.nih.gov/
High-throughput variant
assay
< $99 for ~1M variants
2001 2001-current day ~2003 (ongoing)
ARTICLES
Comprehensive, high-throughput analyses
Genome-wide association study of 14,000
GWAS cases of seven common diseases and
3,000 shared controls
The Wellcome Trust Case Control Consortium*
Nature 2008
There is increasing evidence that genome-wide association (GWA) studies represent a powerful approach to the
identification of genes involved in common human diseases. We describe a joint GWA study (using the Affymetrix GeneChip
500K Mapping Array Set) undertaken in the British population, which has examined ,2,000 individuals for each of 7 major
diseases and a shared set of ,3,000 controls. Case-control comparisons identified 24 independent association signals at
P , 5 3 1027: 1 in bipolar disorder, 1 in coronary artery disease, 9 in Crohn’s disease, 3 in rheumatoid arthritis, 7 in type 1
diabetes and 3 in type 2 diabetes. On the basis of prior findings and replication studies thus-far completed, almost all of these
signals reflect genuine susceptibility effects. We observed association at many previously identified loci, and found
compelling evidence that some loci confer risk for more than one of the diseases studied. Across all diseases, we identified a
25 27
Number of raw publications with subject of
“GWAS”
Number of Publications 'GWAS'
3000
2000
1000
1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013
Year
pubmed MeSH terms:
human + GWAS
Number of raw publications with subject of
“GWAS”
mega-meta-GWAS
Number of Publications 'GWAS'
3000
2000 GWAS
age-related macular degeneration
Risch + Merikangas
linkage vs. association
WTCCC
1000
human genome sequenced
1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013
Year
pubmed MeSH terms:
human + GWAS
LD blocks
Figure 3. Indirect Association. Genotyped SNPs often lie in a region of high linka
will be statistically associated with disease as a surrogate for the disease SNP throu
doi:10.1371/journal.pcbi.1002822.g003
–log10[P] –log10[P]
LD block
0
2
4
0
2
4
rs 29 3 8 8 6 4
rs 3 0 1 9 8 8 0
Tag
rs 6 4 6 9 6 6 8
rs 3 0 1 9 8 8 5
rs 1 0 5 0 5 2 9 2
2 alleles
rs1 0 0 1 6 4 6
rs 1 1 7 8 1 5 1 9
rs 2 0 4 7 9 6 2
rs 7 0 1 1 0 5 7
are
rs 13 9 4 8 7 4
T2DM loci (Table
rs 7 8 3 3 7 3 4
rs 8 6 8 6 5 1
*
Can’t
rs 1 5 0 5 5 2 1
rs 2 0 6 2 9 4 7
1). In
rs 7 0 0 0 5 0 5
rs 1 0 5 0 5 2 9 3
rs 7 8 3 3 7 1 2
rs 1 3 9 4 8 7 5
rs 1 0 50 5 3 1 4
rs 6 4 6 9 6 7 4
correlated
rs 7 8 1 7 7 5 4
rs 6 4 6 9 6 7 5
rs 1 0 5 0 5 3 1 0
rs 2 4 6 4 5 9 2
measure
rs 2 4 6 6 2 9 9
*
rs 1 3 2 6 6 6 3 4
rs 2 4 6 6 2 9 5
EXT2 together
rs 2 46 6 2 9 3
rs 1 0 2 8 2 9 4 0
because
rs 1 5 7 8 9 7 8
*
**
rs 6 4 6 9 6 8 1
all cases, the strongest
rs 2 4 6 6 3 1 8
SLC30A8
500K - 1M per chip
*
rs2 4 6 6 3 1 6
rs 1 9 9 5 2 2 2
*
rs7 0 0 5 1 4 0
they rs 9 6 1 6 3 0
*
rs 1 0 5 05 3 0 9
Sladek et al, 2007
rs1 4 9 9 4 3 0
are
rs 2 6 4 9 1 0 2
everything:
rs9 2 4 3 8 8
rs 1 4 9 9 4 3 3
ALX4
rs 1 6 2 2 1 0 8
rs 9 0 4 5 4 4
rs1 7 9 3 7 3 3
rs 1 7 9 3 7 3 2
MAX statistic (see Methods) was obtained with the additive model.
association for the
tified significant associations for seven SNPs representing four new
rs 2 4 6 4 5 9 4
Tag SNPs and Linkage Disequilibrium
SNPs are common proxies for other SNPs
d
b
inherited
–log10[P] –log10[P]
0
2
4
0
2
4
rs 2 2 5 9 0 4 9
rs 2 9 0 1 5 8 7
rs 7 0 8 6 2 8 5
rs 1 2 2 5 7 0 5 3
rs 1 0 7 8 6 0 4 4
rs 7 9 1 0 9 7 7
rs 5 5 1 2 6 6
rs 1 8 8 7 9 2 2
rs 2 1 4 9 6 3 2
rs 2 4 2 1 9 4 0
rs 3 7 3 7 2 2 5
rs 1 1 1 8 7 0 2 5
rs 6 5 8 3 8 2 0
IDE
rs 7 0 7 8 4 1 3
of this gene (Fig. 2a)
rs 1 8 3 2 1 9 7
solely in the secretory
final stages of insulin
Digitizing SNPs:
e.g., Illumina Infinium Array
image: www.lifa-core.de/
image: illumina.com
Assessing Thousands of Factors Simultaneously:
Data-driven search for differences in SNP frequencies
GCAGGTACATG...GGTA...
GCAGGTACACG...GGTA...
disease cases GCAGGTACATG...GGTA...
GCAGGTACACG...GGTA...
GCAGGTACATG...GGTA...
GCAGGTACACG...GGTA...
GCAGGTACATG...GGTA...
GCAGGTACATG...GGTA...
healthy controls GCAGGTACATG...GGTA...
GCAGGTACATG...GGTA...
disease cases
healthy controls
A a
cases diseased
controls non-
diseased
Associating One SNP with Disease
What is an “Odds Ratio”?
?
SNP (A/a) Disease
A a Odds Ratio a vs A:
cases diseased c d Odds of disease with allele a
non-
vs.
controls diseased
x y
Odds of disease with allele A
Chi-squared test
1: equal odds (no difference)
A a
Odds Ratio a vs A:
cases diseased c d [d/(d+y)]/[y/(d+y)] Odds with allele a
non- Odds with allele A
controls
diseased
x y [c/(x+y)]/[x/(c+x)]
Chi-squared test
?
SNP (A/a) Disease
vs.
SNP (A/a) Non-diseased
Relative Risk
•Need to wait!
•If incidence is low, N needs to be large!
Models to associate genotypes with disease
Examples for a case-control study
ND=4 NC=4
Disease Non-diseased
Aa AA aa Aa
Aa AA aa Aa
Models to associate genotypes with disease
Examples for a case-control study
ND=4 NC=4
Disease Non-diseased
Aa AA aa Aa
Aa AA aa Aa
A a
diseased 6 2 OR A (vs a)
non-
diseased 2 6 OR a (vs A)
Models to associate genotypes with disease
Genotypic Test (“2 or 1 df test”)
ND=4 NC=4
Diseased Non-diseased
Aa AA aa Aa
Aa AA aa Aa
AA Aa aa
diseased
2 2 0 OR AA (vs. Aa)
non-
0 2 2 aa (vs. Aa)
diseased
Associating One SNP with Quantitative Trait
(e.g., height, weight, cholesterol)
100
100
height
80
height
trait
trait
75
60
50
25
40
1 2 3 1 2 3
factor(SNP) factor(SNP)
GG GC CC CC CT TT
Associating One SNP with Quantitative Trait
Linear Regression and Additive Risk Model
SNP rs123456
125
y=ɑ+βx+ε T= risk allele
xCC=0 if individual is CC
xTT=2 if individual is TT
height
trait
75
ɑ
height = ɑ+βx
50
CC (0) CT (1)
factor(SNP)
TT (2)
Prototypical “Manhattan plot” to visualize
associations
NATURE | Vol 447 | 7 June 2007
AA Aa aa
diseased
a non-
evol
15 diseased part
−log10(P)
10 ease
tase
5
well
0 biol
1
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
X
Chromosome T
b capt
100
imp
STR
rved test statistic
80
~100,000 - ~1,000,000 association tests
reve
60 subs
Science, 2007
libri
40 clea
ibility with schizophrenia, a psychotic disorder with many similar- assium channel. Ion channelopathies are well-recognized as causes of
ities to BD. In particular association findings have been reported with episodic central nervous system disease, including seizures, ataxias
15 Bipolar disorder
10
5
0
10
11
12
13
14
15
16
17
18
19
20
21
22
X
Coronary artery disease
15
10
5
0
10
11
12
13
14
15
16
17
18
19
20
21
22
X
15
Crohn’s disease
10
5
0
1
10
11
12
13
14
15
16
17
18
19
20
21
22
X
15
Hypertension
10
−log10(P)
5
0
1
10
11
12
13
14
15
16
17
18
19
20
21
22
X
15 Rheumatoid arthritis
10
5
0
1
10
11
12
13
14
15
16
17
18
19
20
21
22
X
15 Type 1 diabetes
10
5
0
1
10
11
12
13
14
15
16
17
18
19
20
21
22
X
15 Type 2 diabetes
10
5
0
1
10
11
12
13
14
15
16
17
18
19
20
21
22
X
Chromosome
Figure 4 | Genome-wide scan for seven diseases. For each of seven diseases Chromosomes are shown in alternating colours for clarity, with
2log10 of the trend test P value for quality-control-positive SNPs, excluding P values ,1 3 1025 highlighted in green. All panels are truncated at
Type I Error:
False Positives!
what is a p-value?
chance we attain the observed result if no difference (H0)
Bonferroni “correction”:
150000
500
400
100000
Frequency
Frequency
300
200
50000
100
0
0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
p-values under Ho
runif(10000) p-values of GWAS in Total Cholesterol
gwas$P.value
Histogram of gwas$P.value
150000
100000
Frequency
50000
0
gwas$P.value
p-values of GWAS in Total Cholesterol
sent study cannot provide conclusive exclusion of any given gene. This already allow us, for selected diseases, to highlight pathways and
is the consequence of several factors including: less-than-complete mechanisms of particular interest. Naturally, extensive resequencing
coverage of common variation genome-wide on the Affymetrix chip; and fine-mapping work, followed by functional studies will be
Which diseases show evidence of association?
poor coverage (by design) of rare variants, including many structural
variants (thereby reducing power to detect rare, penetrant, alleles)25;
required before such inferences can be translated into robust state-
ments about the molecular and physiological mechanisms involved.
BD CAD CD
30 30 30
25 25 25
20 20 20
15 15 15
10 10 10
Observed test statistic
5 5 5
0 0 0
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
HT RA T1D
30 30 30
25 25 25
20 20 20
15 15 15
10 10 10
5 5 5
0 0 0
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
T2D
30
25
20
15
10
5
0
0 5 10 15 20
Expected chi-squared value
Figure 3 | Quantile-quantile plots for seven genome-wide scans. For each 360,000 SNPs. SNPs at which the test statistic exceeds 30 are represented by
of the seven disease collections, a quantile-quantile plot of the results of the triangles. Additional quantile-quantile plots, which also exclude all SNPs
trend test is shown in black for all SNPs that pass the standard project filters, located in the regions of association listed in Table 3, are superimposed in
have a minor allele frequency .1% and missing data rate ,1%. SNPs that blue (for BD, the exclusion of these SNPs has no visible effect on the plot, and
Observational associations do not equal causation...
Confounding bias
What is a confounder?
?
Ice Cream $ Drowning
Summer!
?
SNP Disease
race/ethnicity
?
FTO Diabetes
Body Mass
•allele frequency
•data manipulation/filtering
•chi-square
•logistic
•linear
http://pngu.mgh.harvard.edu/~purcell/plink/
Examples:
•beta-cell function
CDC,
•Complications arise due to
glucose in blood, hyperglycemia
Nature, 2/2007
13 3 1
A. Donaldson, H. Dorkins, F. Douglas, D. Eccles, R. Eeles, St. Mary’s
which115
Jose C. Florez,1–6 Bo Isomaa,14,15 Sekar Kathiresan,1,3,5 Guillaume (2003).7 7
Burtt,1 Hong Chen, 8
for thesement of Clinical Sciences, (2002). Medicine
Community
were hitherto
Lyssenko, unknown.
Peter Almgren, NoëlAP.systematic search Gung-Wei variants was
6
D. G.recently
Evans,7 S.made Goff,6 S.possible
Goodman,by D. Research
5 the development
2 of8 20
Yrb. Phys. Anthropol. 19, Handbook H.ofGregory,
Paleoanthropology Vol.6 2: Western
F. Elmslie, Goudie, 0JH, UK.
Table South
S1 East of Scotland Clinical Genetics Service, Clinical and
Lettre,1,6,9–11 16
Ulf Lindblad, Helen N. Lyon, 1,6,9–11
11.
Olle Melander, 7
J.
high-density T. Stern,
8
Chirn, arrays
Qichengthat 8
Ma, permit
Hemangthe Parikh, 7
59 (1975).
Delwood Richardson,
genotyping 8
of hundreds Unit, 26. C. V.
of J.thousands
Gray, Ward,
University
15 in
Hospital
of Malmö,
polymorphisms.
L. Greenhalgh, 16 Lund We University,
17 testedMalmö,
S. V. Hodgson, 392,935 General Hospital, Crewe Road, Edinburgh, EH4 Health, 30 G
Am. J. Phys. Anthropol. T.Primate Evolution and Human Origins,
1–3,5 17 8 8 7,12 1,2 17 1 8 W. Henke,
Movies S1 to S3
Christopher Newton-Cheh, Peter Nilsson, Marju Orho- 12. S. Darrell
K. S. Thorpe,
single-nucleotide R. H. J.Crompton,
Ricke,polymorphisms
Jeffrey Roix, Leif in Groop,
a French Shaun Purcell,
case–control Sweden.
cohort. Homfray,
MarkersDepartment
6
R.withS. of Clinical
Houlston,
the most 1Sciences,
L. Izatt,Medicine
significant Research
L. difference
Jackson, 18
in 2XU, UK. 9 Department of Medical Genetics, The Princess
genotype 21
Department
Melander,7 Lennart Råstam,16 Elizabeth K. Speliotes,1,3,6,9–11 131,
1–6 1–3,5
frequencies David 384 (2006).cases
Altshuler,
between Mark J.ofDaly type 2(Chair) diabetes and controls were L.I.University
Unit, Tattersall,
Jeffers,
fast-tracked Eds.
Hospital
19
V. (Springer,
Malmö,
Johnson-Roffey,
for testing LundHeidelberg,
University,
12
in F. Kavalier,
a second Germany,
Malmö,
18
cohort. Kirk,2007),
C.Sweden. 19
This 10 Anne 5Hospital,
identified February 2007;Road,
Coxford accepted 18 AprilS016
Southampton, 20075YA, UK. Trust, Box 13
Marja-Riitta Taskinen,12 Tiinamaija Tuomi,12,15 Benjamin 13. F. K. D. Hunt, J. Hum. Evol. 26, 183 (1994). 18
Clinical
F.pp. Chemistry,
7
1011–1030.
Lalloo, C. Langman, University
18
I. Locke, 1
Hospital
M. Longmuir, Malmö,
4
J. Lund
Mackay, 20
10.1126/science.1140799
Clinical Genetics Unit, Birmingham Women’s Hospital, 22
Department
four loci1containing variants that confer type 2 diabetes risk, in addition19to confirming 6 the 19 known association with the TCF7L2
Voight,1–3,5 David Altshuler,1–6 Joel N. Hirschhorn,1,6,9–11 Thomas Broad Institute of Harvard and Massachusetts Institute of University,
A. Magee, Malmö, S. Mansour, Sweden. Department of
Z. Miedzybrodzka, 17
Miller, 11
J.Psychiatry, Metchley Park Road, Edgbaston, Birmingham, B15 2TG, of Chester Ho
E. Hughes,8 Leif Groop7,12 (Chair) gene. These loci include
Technology a non-synonymous
(MIT), Cambridge, MA 02142, USA. polymorphism
2
Center for in the P.zinc
Massachusetts transporter
Morrison, 19
General SLC30A8,
Hospital,
V. Murday, 4 which21
Harvard
J. Paterson, is G.
expressed
Medical School,
Pichert, 18exclusively in Regional Genetic Service, Department of 23Department
UK. 11 Yorkshire
DNA sample QC and diabetes replication genotyping: insulin-producing and two linkage disequilibrium
General Hospital, blocks M.that MAcontain genes
b-cells, 8 6 potentially involved in b-cell
Human Genetic Research, Massachusetts Boston, Porteous, 02115, USA.
N. Rahman, M. Rogers,15 S. Rowe, 22
S. Shanley, 1
Clinical Genetics, Cancer Genetics Building, St. James Road, Headin
T
first stage of this study, sex-chromosome SNPs gender.
Consortium (WTCCC),† Mark I. McCarthy,1,2 1 ‡§ Andrew T. Hattersley
3,4 allowed us
ype 2 diabetes, obesity, ‡ and cardiovascular toUniversity,
Departments of Human Genetics, 2Medicine and 3Pediatrics, Faculty of Medicine, McGilllibrium
Jaakko(P
purifying Tuomilehto,
< selection,
Montreal
−4
10H3H and
in Canada.
1P3, cases 4or Francis
10,11,12 has been
controls)
McGill
S. (8).
Collins,
made
University and
3
Thispos-
Genome
* Michael whelm
Quebec Innovation
Boehnke 1
* number of true results. We
a small used
variation in
riskH3Afactors
Centre, Montreal are
1A4, Canada. 5
caused
CNRS by a
8090-Institute ofcombination sible
Biology, Pasteur Institute, by
T2D-specific
Lille 59019 genomic
Cedex, advances
data set shows
France. 6
Endocrinology no such
and as
evidence of
Diabetology, the human
sub- 9Ontario
University Hospital, three
Poitiers strategies to search for evidence of sys-
portion, w
86021 Cedex, France. 7INSERM U780-IFR69, Villejuif 94807, France. 8Endocrinology-Diabetology
The molecular mechanisms involved in the Institute
development of typesusceptibility,
2 diabetes areenvironment,
poorly genome Identifying the genetic
Unit, Corbeil-Essonnes
sequence, SNP variants
Hospital, that increase
Corbeil-Essonnes the risk
91100, France. of type 2 diabetes (T2D) in humans has
of genetic be-Research from11and HapMap databases, tematic bias from unrecognized population (8,struc-
Science, 6/2007
for Cancer Research, Toronto M5G 1L7, Canada. 10
Montreal Diabetes stantial confounding
Center, Montreal H2L 4M1, Canada. population
Molecular Nutritionsubstruc-
Unit and the Department of 13) that
understood. Starting from genome-wide genotype data for 1924 diabetic
the Centrecases and 2938 beenH3Ca 3J7,
formidable challenge.
Canada. 12Polypeptide Adopting
Laboratoryaand genome-wide association strategy, we genotyped 1161
havior,
and and
Nutrition, University
Cell Biology, chance.
Montreal H3A 2B2,Whole-genome
of Montreal and
Canada. 13Department of association
Hospitalier de l’Université
and
turegenotyping
de Montréal, Montreal
Epidemiology & Public Health, and genotyping
Finnish
Imperial T2D cases
College, Starrays (3).
biases
and
Mary’s Campus,
Hormone
(8). 14 ture,
Department of Anatomy
the
1174 Finnish normal glucose-tolerant (NGT) controls with >315,000
Norfolk Place, London W2 1PG, UK. Section of analytical approach, and genotyping
equilibrium
population controls generated by the Wellcome Trust
studies CaseImperial
(WGAS) Control Consortium,
offer a new weandset
approach outtoto genedetect We studied
London W12 1464 patients with fromT2D and genotypesartifacts (7, 8).additional
First, we>2examined
To distinguish true million the distribu-
Genomic Medicine, College London W12 0NN, Hammersmith Hospital, Du Cane Road, 0HS, UK. associations those Centre d’E
*These authors contributed equally to this work. single-nucleotide polymorphisms (SNPs) and imputed for an
replicated diabetes association signals through analysis of 3757 additional
discovery unbiased with regard to presumed 1467cases and 5346 controls reflecting fluctuations
controlsSNPs. from under the null or residual (Utah resid
autosomal WeFinland
carried out andassociation
Sweden,analysis each with tion of P-values
these in thegenetic
SNPs to identify population-based
variants sam-
and by integration of our findings with equivalent data from other international consortia. We errors arising from aberrant allele calling, we first
881
functions or locations of causal variants. ©2007 This characterized
Nature PublishingthatGroup
predispose for to18T2D,clinical
compared traits:ouranthropomet-
T2D association results ple, observing a close
with the results of two match to that expected
similar studies,
detected diabetes susceptibility loci in and around the genes CDKAL1, CDKN2A/CDKN2B, and submitted
and putative
genotyped signals
80 SNPs from
in anthe WTCCC
additional study
1215 Finnish T2D cases and 1258 Finnish NGT controls.
approach is based on Fisher’s theory for additive ric measures, glucose tolerance and insulin se- for a null distribution (genomic inflation1Department
IGF2BP2 and confirmed the recently described associations at HHEX/IDE and SLC30A8. Our findings to We
factor
additional
identify quality control,variants
T2D-associated including in andcluster-
an intergenic region of chromosome 11p12, contribute Genetics, Uni
effects at common alleles (1);
provide insight into the genetic architecture of type 2 diabetes, emphasizing the contribution of human heterozy- cretion, lipids and apolipoproteins, blood l = 1.05 for T2D). Second, we calculated
plot to visualization
the identification and of validation
T2D-associatedgenotyping variants on near the GC genes IGF2BP2 and CDKAL1 and the USA. 2Depar
ARTICLES
A genome-wide association study
identifies novel risk loci for type 2 diabetes
Robert Sladek1,2,4, Ghislain Rocheleau1*, Johan Rung4*, Christian Dina5*, Lishuang Shen1, David Serre1,
Philippe Boutin5, Daniel Vincent4, Alexandre Belisle4, Samy Hadjadj6, Beverley Balkau7, Barbara Heude7,
Guillaume Charpentier8, Thomas J. Hudson4,9, Alexandre Montpetit4, Alexey V. Pshezhetsky10, Marc Prentki10,11,
Barry I. Posner2,12, David J. Balding13, David Meyre5, Constantin Polychronakos1,3 & Philippe Froguel5,14
Type 2 diabetes mellitus results from the interaction of environmental factors with a combination of genetic variants, most of
which were hitherto unknown. A systematic search for these variants was recently made possible by the development of
high-density arrays that permit the genotyping of hundreds of thousands of polymorphisms. We tested 392,935
single-nucleotide polymorphisms in a French case–control cohort. Markers with the most significant difference in genotype
frequencies between cases of type 2 diabetes and controls were fast-tracked for testing in a second cohort. This identified
four loci containing variants that confer type 2 diabetes risk, in addition to confirming the known association with the TCF7L2
gene. These loci include a non-synonymous polymorphism in the zinc transporter SLC30A8, which is expressed exclusively in
insulin-producing b-cells, and two linkage disequilibrium blocks that contain genes potentially involved in b-cell
development or function (IDE–KIF11–HHEX and EXT2–ALX4). These associations explain a substantial portion of disease risk
and constitute proof of principle for the genome-wide approach to the elucidation of complex genetic traits.
controls: non-obese
chondrial and other syndromic types of diabetes mellitus) have been initially studied diabetic patients with a body mass index (BMI)
elucidated, few variants leading to common T2DM have been clearly ,30 kg m22. Control subjects were selected to have fasting blood
identified and individually confer only a small risk (odds ratio < 1.1– glucose ,5.7 mmol l21 in DESIR, a large prospective cohort for the
1.25) of developing T2DM1. Linkage studies have reported many study of insulin resistance in French subjects22.
T2DM-linked chromosomal regions and have identified putative, cau- Genotypes for each study subject were obtained using two plat-
sative genetic variants in CAPN10 (ref. 2), ENPP1 (ref. 3), HNF4A (refs
Human Hap300 chip, showing no T2DM association in stage 1 BMI on the association between marker and disease, as it is asymp-
(P . 0.01) and separated by at least 100 kb. Using the first principal totically equivalent to the Armitage trend test used to detect asso-
component as a covariate for ancestry differences between cases and ciation in stages 1 and 2. None of the associations (Supplementary
controls, we tested for association between rs932206 and disease Table 7) was substantially changed by considering the effects of these
status. Our result suggests that this apparent association is largely covariates.
5 5 5 5 5
3 3 3 3 3
1 1 1 1 1
1 2 3 4 5
5 5 5 5 15
10
3 3 3 3
5
1 1 1 1
6 7 8 9 10
5 5 5 5 5
3 3 3 3 3
1 1 1 1 1
11 12 13 14 15
5 5 5 5 5
3 3 3 3 3
1 1 1 1 1
16 17 18 19 20
5 5 5
3 3 3
1 1 1
21 22 X
Figure 1 | Graphical summary of stage 1 association results. T2DM 2log10[pMAX], the P-value obtained by the MAX statistic, for each SNP
association was determined for SNPs on the Human1 and Hap300 chips. The (Note the different scale on the y axis of the chromosome 10 plot.). SNPs that
x axis represents the chromosome position from pter; the y axis shows passed the cutoff for a fast-tracked second stage are highlighted in red.
882
©2007 Nature Publishing Group Sladek, 2007
1 1 1
3 4 5
5 5 15
10
3 3
5
1 1
NATURE | Vol 445 | 22 February 2007 ARTICLES
8 9 10
rs7903146 10 114,748,339 T C 0.406 0.293 1.65 6 0.19 2.77 6 0.50 0.28 1.0546 1.5 3 10234 ,1.0 3 1027 3.2 3 10217 ,3.3 3 10210 TCF7L2
rs13266634 1 8 118,253,964 C 1 C 0.254 0.301 1.18 6 0.25 1
1.53 6 0.31 0.24 1.0089 6.1 3 1028 5.0 3 1027 2.1 3 1025 1.8 3 1025 SLC30A8
26
rs1111875 10 94,452,862 G G 0.358 0.402 1.19 6 0.19 1.44 6 0.24 0.19 1.0069 3.0 3 10 7.4 3 1026 9.1 3 10 26
7.3 3 1026 HHEX
rs7923837 13 10 94,471,897 G G 14
0.335 0.377 1.22 6 0.21 1.45 6 0.25 15 0.20 1.0065 7.5 3 1026 2.2 3 1025 3.4 3 1026 2.5 3 1026 HHEX
rs7480010 11 42,203,294 G A 0.336 0.301 1.14 6 0.13 1.40 6 0.25 0.08 1.0041 1.1 3 1024 2.9 3 1024 1.5 3 1025 1.2 3 1025 LOC387761
rs3740878 11 44,214,378 A A 0.240 0.272 1.26 6 0.29 1.46 6 0.33 0.24 1.0046 1.2 3 1024 2.8 3 1024 1.8 3 1025 1.3 3 1025 EXT2
rs11037909 11 44,212,190 T T 0.240 0.271 1.27 6 0.30 1.47 6 0.33 0.25 1.0045 1.8 3 1024 4.5 3 1024 1.8 3 1025 1.3 3 1025 EXT2
rs1113132 5 11 44,209,979 C 5
C 0.237 0.267 1.15 6 0.27 5
1.36 6 0.31 0.19 1.0044 3.3 3 10 24
8.1 3 1024 3.7 3 10 25
2.9 3 1025 EXT2
Significant T2DM associations were confirmed for eight SNPs in five loci. Allele frequencies, odds ratios (with 95% confidence intervals) and PAR were calculated using only the stage 2 data. Allele
3 3 3
frequencies in the controls were very close to those reported for the CEU set (European subjects genotyped in the HapMap project). Induced sibling recurrent risk ratios (ls) were estimated using
stage 2 genotype counts for the control subjects and assuming a T2DM prevalence of 7% in the French population. hom, homozygous; het, heterozygous; major allele, the allele with the higher
1 1 1
frequency in controls; pMAX, P-value of the MAX statistic from the x2 distribution; pMAX (perm), P-value of the MAX statistic from the permutation-derived empirical distribution (pMAX and
pMAX (perm) are adjusted for variance inflation); risk allele, the allele with higher frequency in cases compared with controls.
18 19 20 Sladek, 2007
Identification of four novel T2DM loci The most significant of these corresponds to rs13266634, a non-
Confirmed 8 SNPs with N ~ 1000
Our fast-track stage 2 genotyping confirmed the reported association synonymous SNP (R325W) in SLC30A8, located in a 33-kb linkage
5
for rs7903146 How would you interpret the p-
(TCF7L2) on chromosome 10, and in addition iden-
tified significant associations for seven SNPs representing four new
disequilibrium block on chromosome 8, containing only the 39 end
of this gene (Fig. 2a). SLC30A8 encodes a zinc transporter expressed
T2DM values?
3 loci (Table 1). In all cases, the strongest association for the solely in the secretory vesicles of b-cells and is thus implicated in the
Odds ratios?
MAX statistic (see Methods) was obtained with the additive model.
1
final stages of insulin biosynthesis, which involve co-crystallization
a
X b
4 4
–log10[P]
–log10[P]
2 2
DM 0 2log10[pMAX], the P-value
SLC30A8
obtained by the MAX statistic,
IDE
for each SNP
KIF11
0
HHEX
Scaling up discovery by combining populations:
meta-analyses
g the Diabetes Genetics data from the WTCCC, DGI and FUSION scans)10 (Supplementary
nvestigation of NIDDM Note). We found strong evidence that the minor G allele of
nd (iv) the Framingham
Meta-analysis of SNP rs10830963:
rs10830963 was associated with increased risk of T2D (odds ratio ¼
omponent studies (n ¼ Combining
1.09 findings
(1.05–1.12), P ¼ 3.3 " 10#7;from multiple
Fig. 2 and cohorts
Supplementary Table 6
ry Table 1 online. online). The possibility that the fasting glucose association might
aring, the four consortia
n 10 and 20 SNPs promi- Study ID OR (95% CI) Weight
their individual, interim, (%)
mentary Table 2 online). DGI 1.12 (0.96, 1.30) 4.61
oci with consistent effects FUSION 1.20 (1.03, 1.39) 4.89
dies. Two of these repre- WTCCC 1.07 (0.95, 1.20) 8.03
deCODE 1.14 (1.03, 1.27) 9.58
6PC2 and GCK. In addi-
KORA 1.00 (0.84, 1.19) 3.53
nerated evidence for an Rotterdam 1.17 (1.04, 1.30) 8.75
NPs around the MTNR1B CCC 1.07 (0.88, 1.31) 2.69
rs1387153, P ¼ 2.2 " ADDITION/ELY 1.16 (1.02, 1.33) 6.04
10#11; DFS: rs10830963, Norfolk 1.00 (0.90, 1.10) 10.56
UKT2DGC 1.03 (0.96, 1.10) 23.18
5.8 " 10#4, for the most
OxGN/58BC 0.91 (0.75, 1.10) 2.85
ch analysis). The associa- FUSION Stage 2 1.15 (1.02, 1.30) 7.41
d on formal meta-analysis METSIM 1.16 (1.03, 1.30) 7.90
r exclusion of individuals 2
Overall (I = 26.6%, P = 0.176) 1.09 (1.05, 1.12) 100.00
¼ 1.1 " 10#57; rs4607517 Meta-analysis P value = 3.3 × 10
–7
30 HHEX/IDE
–log10(P)
KCNQ1 (2 signals*: )
CDC123/CAMK1D
CHCHD9 KCNJ11
CDKAL1
CDKN2A/2B CENTD2
20 MTNR1B
SLC30A8
ADAMTS9 IGF2BP2 HMGA2 ZFAND6
TP53INP1
BCL11A PPAR TSPAN8/LGR5 PRC1
WFS1 JAZF1
10 IRS1 ZBED3 HNF1A FTO
THADA HNF1B DUSP9
KLF14
NOTCH2
–5
Suggestive statistical association (P < 1 10 ) Conditional analysis
–4
Association in identified or established region (P < 1 10 )
–log10(P)
10
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 X
Chromosome
Figure 1 Genome-wide Manhattan plots for the DIAGRAM+ stage 1 meta-analysis. Top panel summarizes the results of the unconditional meta-
analysis. Previously established loci are denoted in red and loci identified by the current study are denoted in green. The ten signals in blue are those
taken forward but not confirmed in stage 2 analyses. The genes used to name signals have been chosen on the basis of proximity to the index SNP and
should not be presumed to indicate causality. The lower panel summarizes the results of equivalent meta-analysis after conditioning on 30 previously
established and newly identified autosomal T2D-associated SNPs (denoted by the dotted lines below these loci in the upper panel). Newly discovered
conditional signals (outside established loci) are denoted with an orange dot if they show suggestive levels of significance (P < 10−5), whereas
secondary signals close to already confirmed T2D loci are shown in purple (P < 10−4).
13 autosomal loci exceeded the threshold for genome-wide significance (r2 < 0.05), and conditional analyses (see below) establish these SNPs
(P ranging from 2.8 × 10−8 to 1.4 × 10−22) with allele-specific odds as independent (Fig. 2 and Supplementary Table 4). Further analysis
.609 SLC30A8 Region CDKN2A/B Region
100 10 rs3802177 stage 1 rs3802177 100 rs10965250 stage 1 rs10965250 100
● r^2: 0.8 − 1.0 10 ● r^2: 0.8 − 1.0 ●
●
● r^2: 0.6 − 0.8 ● r^2: 0.6 − 0.8
recombination rate (cM/Mb)
− log10(P−value)
● r^2: 0.0 − 0.2 ● r^2: 0.0 − 0.2 ●
60 6 ● r^2 missing 60 ● r^2 missing 60
6
●●
●
●●●
●
40 4 ●
● 40 ●
●
●
● ● 40
●
●
●
●●
●
●
4 ● ●●
●●●●
●
● ●
● ● ● ●
● ●
●●● ● ● ●●
●
● ● ● ● ● ●●
●
● ● ●
● ● ●●
● ● ● ● ● ● ●
●● ●●● ● ● ●
● ●●● ●●
● ●● ● ●● ● ●● ●● ● ●
20 2 20 20
●
● ●● ●●
●● ●●
● ●
● ● ● ●● ●●●
●
●●
●
●
●● ●
●
●●
● ● ●●
●●
● ●● ● ●
2
● ●
● ●● ●
● ● ●●●
● ●● ●● ●●
● ●
● ● ● ●●
●
● ●
●
●●
● ● ● ● ● ● ● ●● ●●
●● ●
●● ●
●
●●●● ●●
● ● ●● ●●● ● ●● ● ● ● ●●●● ● ● ● ● ● ●● ● ●
● ● ●●
●● ●●
●
● ● ●●●●●● ●
●●
● ●●
●
●●● ●
● ●● ●
●●
●●
●●
● ●
● ●● ●
●●
●●
●
● ● ● ●●
●● ●
●
●
●●●
●
● ●●
● ● ●●●
●
●●● ●●●
●
●
●
● ●● ●●●
●
● ●
●●
● ●●●
●
●●● ●
●●
●
●
●
●
●●●
●
●
● ●● ●
●
●
●
●
●
●●
●
● ● ●
●● ● ● ●
●
● ● ●●
●
●● ●
● ●● ●●●
● ●
●
● ● ● ● ●●● ● ●
●
●●
●
●
●●● ●
● ● ●
●
●●● ●
●
● ● ●
●●
●
●
●●●
●
●●●●
●
● ●
●●
●
●
●
●
●
● ● ● ●
●●● ● ●●
●
●●● ● ● ● ●
●● ● ●● ● ●●●
●
●
●
●
● ● ●● ●●
●
● ●
●
●●
●
●
● ●● ● ● ●
●●
●
●●●●
●
● ●●
●
●
●● ●● ●● ● ● ●●●● ●● ●●●●●
●
● ●●● ● ● ● ● ●●
●
● ● ● ●●
●
● ●
●●
●
●
●
●●● ● ●
● ●
●
● ● ●●● ●
● ●● ● ● ●●
● ● ●●●●●● ●
●●
●●
●●● ●
●
●
● ●
● ● ●
●
● ●●
●
●
●● ●●
●
●●
●●
●
● ● ●●●● ●
●
●
● ●
● ● ●
●
●●●●
●●
●
●●● ●
●●
● ●●●
●●● ●
● ●●
●
●
● ● ● ●●●
● ●●
●
●●●
● ● ● ● ●●
●●
● ●● ●
●
●●
● ●● ●
●
● ●
●● ●● ●
●
● ●
●
●●
●
●●●
●
● ● ●
● ●
●●
●●
●
●● ●
●● ●●
●● ●●●
●
● ●
●
●●
●●
●
●●●
●
●
●
●●●
● ●●●
● ●
●●●● ●●●●
●● ● ●●● ●
●●●●
●
●●
●● ●●
●●
●●●● ● ●
●●
●
●
●
●●
●●●
●●
●
●
● ●
●
● ●
●●
●●
● ●●● ● ●●●
● ●● ● ●●●● ●●●●●● ●
●
●●
●
●
●
●●
●●
●
●●●
●●● ●●●● ●●
● ●
●●
● ●
● ●● ● ●● ●●●
●
●
●● ●
●●
● ●
●●
●●● ● ● ● ●●
●● ●
●
●●
●●
●
● ●
●●
●
●●●●● ●●
● ●●
●
●
●
●
●
●
●
●● ●●●
●
●●●
●
● ●
●
●
●●
●
●●
●
●
●
●
●
●●
●●
●●
●●
●● ● ●
●
●
●
●●
●
●
●
●
● ● ●
●●●
●
● ●●●
●●
●
●●●
●
● ●
●
●●●●● ●
●●
● ● ●●
●●
●●
●
● ●●●● ● ●
●●
●●
●●
●●●
●●●●
●
●
●
●
●
●
●●
●●
●
●●
● ●
●
●
●
●
●
●●
●
●
●
●
●
●
●●●●
●
●
●
● ●●●●
●●●●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●●●
●
●
● ●
●●●
●
●
●
●
●●●
●
●
●
●
●●●
●●
●
●
●
●
●
● ●
●
●●●●
●
●●●
●●
●●
● ●
●
●
●
●
●●
●●●
●●
●●●
●●
●
●●
●
●
●●
●●●
●
●●●
●
●
●●
●
● ●
●
●
●
●
●
●
●●
●●
●
●
●●
●
●
●
●
●●
●
●
●● ● ●●●
● ●
●●
●●●● ●
●●●
●●
●
●●●
● ●●●●
●●
●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●●●●
●●
●
●●●
●
●●
●
●●
●●
●
●
●
●
●
●
●
●●
●
●●
●●●
● ●
●
●
●
● ●●
●
● ●
●
●
●
●
●●
●●●●●● ●
●●
●
●
●
●●●
●
● ●
●
●●●●
●●
●
●
●●
●●
●
●● ●● ●
●
● ●●●
●●●●
●●●
●
●
●
●
●
●
●
●
●
●●
● ●●
●●
● ● ●
●●●
● ●
●
●●
●
●
●●●
●●
● ●●●
● ●
●
●●
●
●
●●●●
●● ●
●
●●●
●
●
●
●
●●
●
●●
● ●●●●●● ●●●
●●
●
●●●
●
●
●●
● ●●● ●● ● ● ●●
●● ●
●●
● ● ●● ●●●
● ●●
● ● ● ●
●
● ●●● ● ● ● ● ● ●●
● ●●●
● ● ●● ● ● ●
● ●
●●●●●●●
● ●●
● ●
● ● ●●
● ●
●
● ● ● ●●●
● ●
●
●●●
●●
●
● ●
● ●
●●●
●●
●●
●●●
●●● ● ●● ●
● ● ●
● ●● ●● ●●●
●
● ● ●●● ●● ● ● ● ●●●●
●● ●●●● ●
● ●●● ● ●●●● ● ●● ● ●●
●● ● ● ●● ●● ●
● ●
●●● ●●
●
●
●
●
●●
●●●
●●●●
●●
●
●●
●●
● ●●●
● ● ●
●
●
●
●●
●
●
●
●
●
●●
●
●●●
●●
●
●●
●
●
●
●
●
●
●
● ●
●●
●●
●
●
●
●●
●
●●
●
●
●●
● ●
●
●●
●
●●
●●
●
●
●
●
●
●
●●
● ●●●
●●
●
●●
●
● ●
●
●●
●●
●
●●
●
●
●●●
●
●●●
●
●
●
●●
●●●
●
●
●●
●●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
● ●
●● ● ●●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●●●●
●●●
●●
● ●●
●●●●●
●
●
●●
●
●
●
●
●●
●
●●●
●
●
●
●●●●
●
●
●
●
●
●
●●
●●
●
●
●●
●
●
●●●
●
●
●●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●
●●
●
●●
●
●●●
●●●
●●
●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●●
●●
●
●●
●
●
●●
●
●
●●●
●
●●●
●●
●
●
●
●
●
●
●
●●●
●
●
●
●●
●●
●
●
●●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
● ●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●
●
●
●●
● ●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●
● ●●
●
●
●
●
●
●
●●●
●
●
●●●
●
●
●
●●●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●●●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●
●●
●●
●
●
●●● ● ●
●
●
●
●
●
●
●
●
●
●●●
●● ●
●
●●●●
●
●
●
●●
●
●
●
●●
●
●●
●●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●●
●●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●●
●●
●
●●
●
●●
●
●
●●
● ●
●
●
●
●●
●●
●
●
●
●
●
●●
●
●
●●●
●●
●
●
●●
● ●
●●
●● ●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●●
●
●
●
●●
●●● ●
●
●●●●
● ●
●●
●
●
0 0 0 0
● ● ●
0
● ● ●● ●
● ● ●● ●● ●
● ●●●
● ●●
● ●● ●● ●
● ●
●
● ●
●●● ● ● ● ● ●
● ●
● ●
●● ● ●
● ● ● ● ●
●●● ●● ●
●
● ● ●
● ●● ●●
● ●
●●●●● ● ● ● ●● ●
●
●● ● ● ● ● ●
●●●● ● ●
● ●
●● ●● ● ● ● ●●●● ●●● ●● ●
●● ●
● ●● ●
● ●
●
●●●
● ●
●
●
●●
●
●● ●
●
●
●●
●
●●●
●●
● ●
●●
●
●
●●
●
●●
●
●●●●●●●●
●
● ●
●
●
●●
●●
●●●
●
●●●●
●●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●●
●●
●●
●
●
●●
●●●
●●●●●
● ●●●
●
● ●
●
●●●● ●
●●
●
●
●●●
●●
●
●●
●●
●
●
●●
●
●●
●●
●
●●
●
●
●
●●
●
●
●●
● ●
●
●
●●
●●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●
●● ●
●●
●●
● ●
●●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●●● ●●
●
●
● ●
●●●●
●
●
●●
●
●
●●
●
●●
●
●●
●
●
● ●
●●●●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●●
●●
●●
●●
●
●●
●
●
●
●
●●
●
●
●
●●
●●
●
●
●●
● ●●
●
●●
●●●
●●
●
●
●●
●
●
●
●●
●
●
●
●●
●●●●
●●
●
●●
●
●●
●●
●
●
●●
●
●●
●●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●●
● ●
●●
●●
●
●●
●●
●
● ●● ●
●●
●
●
●●
●●
●●●
● ●●●●
●●●●●
●
●
●●
●
●
●
●
●●●
●●
●●●
●●●●
●●
● ●
●●●●
●●● ●●●
●
●● ●
●
●●●
● ●
●●●
●
●
●●
●
●
●
●●
●
●
●●●●
●
●●●●
● ●
●●
●●●
●
●
●●
●
●●
●
●
●
●●
●
●
●●
●
● ●
●● ●
●●
●●
● ●
●
●
●●
●●
●
●●
●●●
●
●●
●
●
●
●
●●
●
●
●●●
●●●
●●
●●●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●●
● ●
●
● ●● ● ● ●●
●
●
●●
●
●●
●● ●● ●
● ●●●●●●
●
●
●●
●
●
●
●
●●●
●●
●●●
●
● ●
●
●
●●●
●
●
●
●
●●●
●●●
●
●
●●
●
●●●●
●
●●
●
●●
●●
●●● ●●
●
●●
● ●●●
●●
●
●●
● ●
●
●●
●
●
●
●
●●
●
●
●
●●
●●
●●
●
●
●●
●●
●●
●●●
●
●● ●
●●●●
●
●●●
●●
●●
● ●●
●●●●●
●● ●
●
●●
●
●●
●●
●
●
●●
●
●
●
●
●●
●●
●●
● ●
●●
●●
●●
●
●●
●
●
● ●
●●
●
●●●●●
●
● ●
●●●
●
●
●●
●
<− EIF3H <− SAMD12 KIAA1797 −> <− IFNA13 <− CDKN2B
PGCP <− TRPS1 <− EXT1 <− MLLT3 <− IFNA7 <− CDKN2A
In a gene...
CDC123/CAMK1D Region NotHHEX/IDE
in a gene...
Region
100 10 rs12779790 stage 1 rs12779790 100 rs5015480 stage 1 rs5015480 100
● r^2: 0.8 − 1.0 15 ● r^2: 0.8 − 1.0
recombination rate (c
recombination rate (c
●
80 8 ● r^2: 0.4 − 0.6 80 ● r^2: 0.4 − 0.6 ●
●
●
● 80
●
● r^2: 0.2 − 0.4 ● r^2: 0.2 − 0.4 ●●● ●
● ●
● ●●●
log10(P−value)
log10(P−value)
● ●●
● r^2: 0.0 − 0.2 ● r^2: 0.0 − 0.2 ●●
60 6 ● r^2 missing 60 10 ● r^2 missing ●
●
●
●
● 60
●● ●●
●
●
● ●
40 4 40 ●
●● ●
40
●
● ●
●
pporting!Figures!
RESEARCH ARTICLE !
! nome. In total, we identified 3,899,693 distinct
DHS positions along the genome (collectively
D
isease- and trait-associated genetic variants and enhancer elements (3–6) and enrichment with- gene bodies (fig. S1B); however, only 10.9% of
are rapidly being identified with genome- in expression quantitative trait loci (eQTL) (3, 7, 8). intronic GWAS SNPs within DHSs are in strong
wide association studies (GWAS) and re- Human regulatory DNA encompasses a vari- LD (r2 ≥ 0.8) with a coding SNP, indicating that
lated strategies (1). To date, hundreds of GWAS ety of cis-regulatory elements within which the co- the vast majority of noncoding genic variants
have been conducted, spanning diverse diseases operative binding of transcription factors creates are not simply tagging coding sequence. Analo-
= $250M
$250M / ~2000 loci
= $125K/locus
Fighter jet
P=G+E
Type 2 Diabetes
Variants Infectious agents
Cancer
Nutrients
Alzheimer’s
Pollutants
for E!
Why?
σ 2
P = σ 2
G + σ 2
E
Heritability (H2) is the range of phenotypic variability
attributed to genetic variability in a population
σ 2
G
H 2 = 2
σP
Indicator of the proportion of phenotypic
differences attributed to G.
Height is an example of a heritable trait:
Despite a century of research on complex traits in humans, the Nature Genetics, 2015
Specifically, the partitioning of observed variability into underlying
© 2015 Nature America, Inc. All rights reserved.
relative importance and specific nature of the influences of genetic and environmental sources and the relative importance of
genes and environment on human traits remain controversial. additive and non-additive genetic variation are continually debated1–5.
We report a meta-analysis of twin correlations and reported Recent results from large-scale genome-wide association studies
variance components for 17,804 traits from 2,748 publications (GWAS) show that many genetic variants contribute to the variation
including 14,558,903 partly dependent twin pairs, virtually in complex traits and that effect sizes are typically small6,7. However,
all published twin studies of complex traits. Estimates of the sum of the variance explained by the detected variants is much
?
E+ E-
diseased
non-
diseased
e modelling
P < 0.05
oblem is akin to – but less well
sed and more poorly understood than –
e testing. For example, consider the use
r regression to adjust the risk levels of
atments to the same background level
There can be many covariates, and Figure 3. The path through a complex process can appear quite simple once the path is defined. Which terms are
t of covariates can be in or out of the included in a multiple linear regression model? Each turn in a maze is analogous to including or not a specific
With ten covariates, there are over 1000 term in the evolving linear model. By keeping an eye on the p-value on the term selected to be at issue, one
can work towards a suitably small p-value. © ktsdesign – Fotolia
models. Consider a maze as a metaphor Young, 2011
elling (Figure 3). The red line traces the
path out of the maze. The path through ways in the literature for dealing with model 2 The data cleaning team creates a JCE, 2015
ze looks simple, once it is known. selection, so we propose a new, composite modelling data set and a holdout set and
Example of fragmentation:
Is everything we eat associated with cancer?
non-replicated
inconsistent effects
FIGURE 1. Effect estimates reported in the literature by malignancy type (top) or ingredient (bottom). Only ingredients with $10 studie
outliers are not shown (effect estimates .10).
non-standardized
Schoenfeld and Ioannidis, AJCN (2012)
Connecting Environmental Exposure with Disease:
Missing the “System” of Exposures?
?
E+ E-
diseased
non-
diseased
10 ease
tase
5
well
0 biol
1
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
X
Environmental Category
te
s
s
al
T
on
in
es a
Chromosome
et
id h
m
rb
ic sp
M
ta
ca
st h o
Vi
b capt
ro
Pe op
yd
100
an
H
imp
rg
O
STR
test statistic
80
What specific environmental “loci” are associated to disease? reve
60 subs
libri
... but there is no “microarray” for environmental exposure...
Gold standard for breadth of human exposure information:
• Physical fitness and physical functioning An advanced computer system using high-
GWAS chip
All participants visit the physician. Dietary inter- survey. Local media may feature stories about Uses of the Data NHANES’ partnership with the U.S. Environ-
views and body measurements are included for the survey. mental Protection Agency allows continued
everyone. All but the very young have a blood Information from NHANES is made available study of the many important environmental
sample taken and will have a dental screening. NHANES is designed to facilitate and en- through an extensive series of publications and influences on our health.
Depending upon the age of the participant, the courage participation. Transportation is provided articles in scientific and technical journals. For
rest of the examination includes tests and proce- to and from the mobile center if necessary. data users and researchers throughout the world,
death)
The study team consists of a physician, medical data essential for the implementation and
and health technicians, as well as dietary and health evaluation of program activities. The U.S.
interviewers. Many of the study staff are Department of Agriculture and NCHS coop-
bilingual (English/Spanish). erate in planning and reporting dietary and
nutrition information from the survey.
1 http://www.cdc.gov/nchs/nhanes.htm
Gold standard for breadth of human exposure information:
National Health and Nutrition Examination Survey
Drugs
Infectious Agents
phthalates, bisphenol A
Physical Activity
Regression:
disease
adjusted for other risk factors
=
False Discovery Rate Estimation: # false positives ≤ α 50 false positives ≤ 0.05
= 0.5
The expected rate of false positives # findings ≤ α 100 findings ≤ 0.05
cases controls
“Shuffle” (permute) disease and non-diseased
participants
Re-run EWAS
FDR (p-value)
bisphenol A 1
PCB199 0.4
β-carotene 0.1
cotinine 0.2
... ...
Novel Findings:
2005-2006
heptachlor β-carotene
γ-tocopherol (vitamin E)
PCB170 Heptachlor Epoxide
epoxide
OR=0.6,0.6
OR=1.8,1.6
OR=4.5,2.3 OR=3.2, 1.8
γ-tocopherol ●
● ●
●
2
●
−log10(pvalue)
●● ● ● ●
●● ●
β-carotene
● ●●● ●
● ● ● ●
● ● ●
● ●
●●
●
● ●
vitamin D
● ●
●
●
● ●
● ●
● ●
● ●●
●
●
●
●
●
●
1
● ● ● ● ● ● ●
●
● ● ●
● ● ●● ● ●
● ● ● ● ●
●
● ●
● ● ● ●●
●
● ●● ● ●
● ●● ●
● ● ● ●●
●
●● ●
● ● ● ●● ● ● ●
● ●
Interesting Patterns:
● ● ●●● ●● ●● ● ●
●
● ● ● ●●● ● ● ●●
●● ● ● ●
● ●
●
●
● ●● ●●
● ●
●
● ● ● ●●
●
●
●
● ● ●●
● ●● ●
●●
●
● ● ●
● ●
●● ●
● ● ● ●● ●● ● ● ●
● ●
●
● ●● ●
pesticides, PCBs ●● ● ● ● ●● ●
●
●
●● ●
0
nutrients carotenoid
nutrients minerals
nutrients vitamin A
nutrients vitamin B
nutrients vitamin C
nutrients vitamin D
nutrients vitamin E
phytoestrogens
cotinine
hydrocarbons
volatile compounds
allergen test
viral infection
bacterial infection
latex
phenols
phthalates
polybrominated ethers
polyflourochemicals
acrylamide
perchlorate
pcbs
dioxins
heavy metals
pesticides atrazine
pesticides chlorophenol
pesticides organochlorine
pesticides organophosphate
pesticides pyrethyroid
diakyl
furans dibenzofuran
What model is used to test for
Fasting Blood Glucose ≥ 126 mg/dL?
association? BMI, SES, ethnicity, age, sex
OR: Δ 1SD of exposure
N=500-2000 per cohort
Compare vs. GWAS?
PLoS ONE, 2010
Exposome factors associated with serum lipids?
FDR < 5%
log10(HDL-C)
adjusted for BMI, SES, ethnicity, age, age2, sex
N=1000-3000
IJE 2012.
EWAS in Triglycerides and LDL-C
organochlorine pesticides
carotenoids
vitamin E
vitamin A
8 factors
carotenoids
vitamin E
vitamin A
IJE 2012.
Effect Sizes For Validated Factors:
HDL-C
pollutants nutrient factors
survey! N! P-value! FDR! Effect (mg/dL)!
% change = Δ 1 SD in Exposure
17 validated factors
IJE 2012.
How do effect sizes compare between GWAS and EWAS?
GWAS
ARTICLES NATURE | Vol 466 | 5 August 2010
EWAS
Table 1 | Meta-analysis of plasma lipid concentrations in >100,000 individuals of European descent. survey! N! P-value! FDR! Effect (mg/dL)!
Locus Chr Lead SNP Lead trait Other traits Alleles/MAF Effect size P eQTL CAD Ethnic
4YY3 10211
211
C LDLRAP1
PABPC4 LDL 1
1
rs12027135
rs4660293
TC
HDL T/A/0.45
LDL T/A/0.45
A/G/0.23 21.22
21.22
20.48
4 3 10
4 3 10 210
228
111?
1111 Y 111?
PCSK9 1 rs2479409 LDL TC A/G/0.30 12.01 2 3 10 210
1111
DL ANGPTL3
EVI5
1
1
rs2131925
rs7515577
TG
TC
A/G/0.23
TC, LDL T/G/0.32
A/C/0.21
20.48
24.94
21.18
9 3 10
3 3 10
4 Y3 10 1111
243
28
111?
Y 1111
Y 228
DL SORT1
ZNF648 TC 1
1
rs629301
rs1689800
LDL
HDL A/G/0.30
TC T/G/0.22
A/G/0.35 12.01
25.65
20.47
1 3 10
3 3 10 2 3 10 1112
Y 2170
210
1111
1111
9 3 10243
213
G MOSC1
GALNT2 TC, LDL1
1
rs2642442
rs4846914
TC
HDL T/G/0.32
LDL
TG
T/C/0.32
A/G/0.40 24.94
21.39
20.61
6 3 10
4 3 10 221
214
111?
1111 Y 1111
IRF2BP2 1 rs514230 TC LDL T/A/0.48 21.36 5 3 10 28
111?
C APOB 2 rs1367117
rs1042034
LDL
TG
A/C/0.21
TC
HDL
G/A/0.30
T/C/0.22
21.18
14.05
25.99
4 3 10
1 3 10
3 3 10
2114
245
1111
1211
111?
2170
DL GCKR
ABCG5/8 TC 2
2
rs1260326
rs4299376
TG
LDL T/G/0.22
TC
TC
C/T/0.41
T/G/0.30 25.65
18.76
12.75
6 3 10
2 3 10 1 3 10 1111
Y 2133
247
1111
Y Y 1111
3Y3 10210
28
RAB3GAP1 2 rs7570971 TC C/A/0.34 2 3 10
DL COBLL1 2 rs10195252 TG A/G/0.35 T/C/0.40
11.25
20.47
22.01 2 3 10 210
210
12??
1111 1112
rs12328675 HDL T/C/0.13 10.68 3 3 10 213
11?1
C IRS1
RAF1
LDL 32 rs2972146
rs2290159
HDL
TC
T/C/0.32
TG T/G/0.37
G/C/0.22
21.39
10.46
21.42
3 3 10
4 3 10
6 3 10 111?
Y 29
Y
29
1111 111?
221
DL TG 34 A/G/0.40 20.61 4 3 10 1111
28
MSL2L1 rs645040 TG T/G/0.22 22.22 3 3 10 1121
212
KLHL8 rs442177 TG T/G/0.41 22.25 9 3 10 1111
214
211
C SLC39A8
ARL15
MAP3K1
LDL 554 rs13107325
rs6450176
rs9686661
HDL
HDL
TG
T/A/0.48 C/T/0.07
G/A/0.26
C/T/0.20
20.84
21.36
20.49
12.57
7 3 10
5 3 10
1 3 10
5 Y
3 10
28
210
12?2
2??1
1111
111?
2114
DL HMGCR
TIMD4
TC 55 rs12916
rs6882076
TC
TC
G/A/0.30
LDL
LDL, TG
T/C/0.39
C/T/0.35
14.05
12.84
21.98
9 3 10
7 3 10
4 3 10
247
228
111?
111?
1111
245
G HDL 66 T/C/0.22 25.99 1 3 10 11?1 1211
211
MYLIP rs3757354 LDL TC C/T/0.22 21.43 1 3 10 1221
210
HFE rs1800562 LDL TC G/A/0.06 22.22 6 3 10
HLA 6 rs3177928 TC LDL G/A/0.16 12.31 4 3 10 Y 2133
219
111?
G C6orf106
TC 6
rs2247056
rs2814944
TG
HDL
C/T/0.41 C/T/0.25
G/A/0.16
18.76
22.99
20.49
2 3 10
4 3 10
6 Y
3 10
215
29
1112
1112
Y 1111
247
DL FRK
TC 6 rs2814982
rs9488822
TC
TC
T/G/0.30
LDL
C/T/0.11
A/T/0.35
12.75
21.86
21.18
5 3 10
2 3 10
2Y3 10 111?
211
Y
210
221? 1111
28
C C/A/0.34 2 3 10
28
CITED2
LPA
6
6
rs605066
rs1564348
HDL
LDL TC
T/C/0.42
T/C/0.17 11.25
20.39
20.56
3 3 10
2 3 10 217
Y
1121
11?1 12??
rs1084651 HDL G/A/0.16 11.95 3 3 10 28
210
11?1
G DNAH11
NPC1L1
7
7
rs12670798
rs2072183
TC
TC
T/C/0.40
LDL
LDL
T/C/0.23
G/C/0.25
22.01
11.43
12.01
9 3 10
3 3 10
2 3 10 111?
210
211
121?
Y 1111
210
DL TYW1B
MLXIPL
7
7
rs13238203
rs17145738
TG
TG
T/C/0.13
HDL
C/T/0.04
C/T/0.12
10.68
27.91
29.32
1 3 10
6 3 10
3 Y
3 10
29
258
1???
1111
11?1
Y 29
3Y3 10
215
DL KLF14
PPP1R3B TG 7
8
rs4731702
rs9987289
HDL
HDL T/G/0.37
TC, LDL
C/T/0.48
G/A/0.09 10.46
10.59
21.21
1 3 10
6 3 10 225
28
1111
1111 Y Y 1111
PINX1 8 rs11776767 TG G/C/0.37 12.01 1 3 10 29
2111
C NAT2
LPL
8
8
rs1495741
rs12678919
TG
TG
G/C/0.22
TC
HDL
A/G/0.22
A/G/0.12
21.42
12.85
213.64
5 3 10
2 3 10
4 3 10
214
Y
Y
2111
2115
1111
111?
28
G CYP7A1
TRPS1
8
8
rs2081687
rs2293889
TC
HDL
T/G/0.22
LDL C/T/0.35
G/T/0.41
22.22
11.23
20.44
2 3 10
6 3 10
3 3 10 1111
212
211
111?
1121
28
212
G TRIB1 8
rs2737229
rs2954029
TC
TG T/G/0.41
TC, LDL, HDL
A/C/0.30
A/T/0.47 22.25
21.11
25.64
2 3 10
3 3 10 9 3 10255
Y
213
112?
1111 1111
7 3 10211
PLEC1 8 rs11136341 LDL TC A/G/0.40 11.40 4 3 10 1111
DL TTC39B 9 rs581080 HDL C/T/0.07
TC C/G/0.18 20.84
20.65 3 3 10 212
1211 Y 12?2
DL G/A/0.26 20.49 Teslovich, 1028
5 32010 2??1
G C/T/0.20 12.57 1 3 10210 1111
EWAS uncovers persistent pollutants
in people with Type 2 Diabetes, Higher Lipids:
How are these factors linked with these diseases?
•organochlorine pesticides
•found all over the world
•polychlorinated biphenyls
•persist in food chain
•dibenzofurans
Porta et al, Environ Int 2008
•dioxins
•arteriosclerosis,
•T2D/insulin resistance
capacitors
Lind et al, EHP, 2011
adhesives
(Korea, Japan, Europe)
How can we study the elusive environment in larger scale for biomedical
discovery?
Opinion
Opinion
VIEWPOINT
VIEWPOINT
Studying
Studying thethe
Opinion Viewpoint Elusive
Elusive Environment
Environment in Large Scale
in Large Scale
It is possible that more than 50% of complex disease risk the EWAS vantage point, intervening on β-carotene
•evaluate new ‘omics technologies
Chirag J. Patel, PhD
ChiragCenter
J. Patel, PhD
for Biomedical isItattributed
is possible
Figure. that more
Correlation
to differences than 50%environment.
inInterdependency
an individual’s ofGlobes
complex disease
for14 Environmental
(Figure, D) risk the
Exposures
seems EWAS
a futile high-throughput,
exercisevantage
(Cotinine, Mercury,
given point,rela-
Cadmium,
its complex intervening
Trans-β-Carotene)oninβ-carotene
National Hea
Informatics, Harvard Air Nutrition
is attributed
pollution, toExamination
smoking, differences Survey
and diet are in an(NHANES)
individual’s
documented Participants,
environ- 2003-2004
environment.
tionship with1 other(Figure,
nutrientsD) andseems a futile exercise given its complex rela
pollutants.
Center for Biomedical
Medical School,
Informatics, Harvard mental factors affecting health, yet these factors are but
non-targeted metabolomics
Giventhiscomplexity,howcanstudiesofenvironmen-
Boston, Massachusetts. Air pollution,
A Serumsmoking,
cotinine and diet are documented B Serum total environ-
mercury tionshipC with Serumother
cadmiumnutrients and pollutants.D Serum trans-β-carotene
Medical School, a fraction of the “exposome,” the totality of the exposure tal risk move forward? First, EWAS analyses should be ap-
mental
load factors
occurring affecting
37 Total correlations
throughout health,
a person’s lifetime.yet1 Investigat-
these 42 factors are but
Total correlations Giventhiscomplexity,howcanstudiesofenvironmen
68 Total correlations
pliedtomultipledatasets,andconsistencycanbeformally
68 Total correlations
Boston, Massachusetts.
John P. A. Ioannidis,
MD, DSc a fraction
ing of the of
one or a handful “exposome,”
exposures at athe
timetotality
Pollutants
has led to exposure tal risk move forward? First, EWAS analyses should be ap
ofatheexaminedforallassessedcorrelations.Second,thetempo-
Stanford Prevention highly fragmentedthroughout
literature of epidemiologic associa- 1 Investigat-
ral relationship between exposure and changes in health
John P. A. Ioannidis, load occurring a person’s lifetime. pliedtomultipledatasets,andconsistencycanbeformally
Research
MD, DSc
Center,
Department of Health
tions. Much of that literature is not reproducible, and se- parameters may offer helpful hints about which of the sig-
ing one or a handful of exposures at a time has led to a examinedforallassessedcorrelations.Second,thetempo what causes what?
Research and Policy, lective reporting may be a major reason for the lack of re- nalsaremorethansimplecorrelations.Third,standardized
Stanford Prevention
Department of highly fragmented
producibility. A new model literature
is required of epidemiologic confounding
associa- ral relationship between exposure and changes in health
to discover adjustedanalyses,inwhichadjustmentsareperformedsys-
Research Center,
Medicine, Stanford
tions. Muchexposures
environmental of that literature
associated with is not reproducible,
disease and se-
while tematically parameters
and in the same way across maymultiple
offer helpful
data sets,hints about which of the sig
Department
UniversityofSchool
Health
of
Medicine, Stanford,
Research and Policy, lective reporting may be a major reason for themay
mitigating possibilities of selective reporting.
lackalso help. This is in stark contrast with the current
of re- nalsaremorethansimplecorrelations.Third,standardized
California, Department To remedy the lack of reproducibility and concerns of model,wherebymostepidemiologicstudiesusesingledata
Department of Stanford
of Statistics, producibility.
validity, multiple personalA new model
exposures canis berequired discover adjustedanalyses,inwhichadjustmentsareperformedsys
assessed si- tosetswithoutreplicationaswellasnon–time-dependentas-
Medicine, Stanford
University School of
environmental
multaneously in termsexposures associated
of their association with awith
Nutrients
condi-disease while
sessments, tematically
and reported adjustments andarein markedly
the same way across multiple data sets
differ-
Humanities
University School andof and vitamins
Sciences, Stanford, tion or diseasepossibilities
mitigating of interest; the ofstrongest
selective associations
reporting.can ent across reports and may data
alsosets,help.
even those
Thisperformed
is in stark by contrast with the curren
Medicine, Stanford,
California, and then be tentatively independent data sets the same team (different approaches increase validity but
validated inInfectious
California, Department
Meta-Research To remedy
Demographic
the lack of
in references 2 andagents
(eg, as doneattributes
reproducibility and concerns of model,wherebymostepidemiologicstudiesusesingledata
3).2,3 The main advan- must be reconciled and assimilated). Negative correlation Positive cor
of Statistics,
InnovationStanford
Center at
validity,
tages of thismultiple personal
process include exposures
the ability to searchcan be assessed
the list However, setswithoutreplicationaswellasnon–time-dependentas
si- eventually for most environmental cor-
Stanford (METRICS),
University School of
Stanford, California.
Humanities and
Sciences, Stanford,
•data mining and informatics to tackle complexity
ofmultaneously
exposures and adjust in terms of their
for multiplicity association
systematically andwith relates,
Each correlation interdependency globe includes 317 environmental exposures
there maysessments,
a condi- be unsurpassable
report all the probed associations instead of only the most ing potential causal inferences based on observational
tion orrepresented
diseaseby ofthe
interest;
nodes aroundthethestrongest
periphery ofassociations
the globe. Pairwisecan ent across
correlations The size
anddifficulty
reports
reported adjustments are markedly differ
establish-
nodes. Correlations with absolute values exceeding 0.2 are shown (strong
andis proportional
of each node data sets,toeven those
the number performed
of edges for a node,by
a
significant results. The term “environment-wide associa- data alone.
are depicted by edges (lines) between the node of interest (arrowhead) and other Factors that seem protective may some-
thickness of each edge indicates the magnitude of the correlation.
then be tentatively validated in independent databesets tested inthe same team trials.(different approaches increase validity bu
•longitudinal/linkable data & biorepositories
California, and tion studies” (EWAS) has been used to describe this ap- times randomized The complexity of
Meta-Research (eg, as(andone
proach analogy in references
to genome-wide 2 and 3).2,3 stud-
association The main must be
advan-correlations
the multiple reconciled
also highlights theand assimilated).
challenge
Innovation Center at ies). High-throughput
For example, 4 ascertainment of endogenous indicators of en-
Wang et alinclude
screenedthe more than 2000 that the
intervening to modify US federally
1 putative risk funded gene
factor also may expression experiment data be
tages of this process ability to search list However, eventually for most environmental cor
Stanford (METRICS), chemicals vironmental
of exposures
exposure
in serum to discover
and adjust
that may reflect
endogenous
forperformance
multiplicity
exposures the as-
exposome
systematically
increasingly
inadvertently at-multiple
affect
relates,
anda seemingly
ited inother
there
public
may
repositories
correlated JAMA, 2014
E
Laboratory
E
Laboratory
E E Data Center
Laboratory
E
Laboratory
•Data repository
•Data standards
http://grants.nih.gov/grants/guide/rfa-files/RFA-ES-15-010.html
Possibilities of discovery with the exposome:
How do we proceed?
Opinion
Opinion
VIEWPOINT
VIEWPOINT
Studying
Studying thethe
Opinion Viewpoint Elusive
Elusive Environment
Environment in Large Scale
in Large Scale
It is possible that more than 50% of complex disease risk the EWAS vantage point, intervening on β-carotene
•evaluate new ‘omics technologies
Chirag J. Patel, PhD
ChiragCenter
J. Patel, PhD
for Biomedical isItattributed
is possible
Figure. that more
Correlation
to differences than 50%environment.
inInterdependency
an individual’s ofGlobes
complex disease
for14 Environmental
(Figure, D) risk the
Exposures
seems EWAS
a futile exercisevantage
(Cotinine,
metabolomics
Mercury,
given point,rela-
Cadmium,
its complex intervening
Trans-β-Carotene)oninβ-carotene
National Hea
Informatics, Harvard Air Nutrition
is attributed
pollution, toExamination
smoking, differences Survey
and diet are in an(NHANES)
individual’s
documented Participants,
environ- 2003-2004
environment.
tionship with1 other(Figure,
nutrientsD) andseems a futile exercise given its complex rela
pollutants.
Center for Biomedical
Medical School,
Informatics, Harvard mental factors affecting health, yet these factors are but Giventhiscomplexity,howcanstudiesofenvironmen-
Boston, Massachusetts. Air pollution,
A Serumsmoking,
cotinine and diet are documented B Serum total environ-
mercury tionshipC with Serumother
cadmiumnutrients and pollutants.D Serum trans-β-carotene
Medical School, a fraction of the “exposome,” the totality of the exposure tal risk move forward? First, EWAS analyses should be ap-
mental
load factors
occurring affecting
37 Total correlations
throughout health,
a person’s lifetime.yet1 Investigat-
these 42 factors are but
Total correlations Giventhiscomplexity,howcanstudiesofenvironmen
68 Total correlations
pliedtomultipledatasets,andconsistencycanbeformally
68 Total correlations
Boston, Massachusetts.
John P. A. Ioannidis,
MD, DSc a fraction
ing of the of
one or a handful “exposome,”
exposures at athe
timetotality
Pollutants
has led to exposure tal risk move forward? First, EWAS analyses should be ap
ofatheexaminedforallassessedcorrelations.Second,thetempo-
Stanford Prevention highly fragmentedthroughout
literature of epidemiologic associa- 1 Investigat-
ral relationship between exposure and changes in health
John P. A. Ioannidis, load occurring a person’s lifetime. pliedtomultipledatasets,andconsistencycanbeformally
Research
MD, DSc
Center,
Department of Health
tions. Much of that literature is not reproducible, and se- parameters may offer helpful hints about which of the sig-
ing one or a handful of exposures at a time has led to a examinedforallassessedcorrelations.Second,thetempo what causes what?
Research and Policy, lective reporting may be a major reason for the lack of re- nalsaremorethansimplecorrelations.Third,standardized
Stanford Prevention
Department of highly fragmented
producibility. A new model literature
is required of epidemiologic confounding
associa- ral relationship between exposure and changes in health
to discover adjustedanalyses,inwhichadjustmentsareperformedsys-
Research Center,
Medicine, Stanford
tions. Muchexposures
environmental of that literature
associated with is not reproducible,
disease and se-
while tematically parameters
and in the same way across maymultiple
offer helpful
data sets,hints about which of the sig
Department
UniversityofSchool
Health
of
Medicine, Stanford,
Research and Policy, lective reporting may be a major reason for themay
mitigating possibilities of selective reporting.
lackalso help. This is in stark contrast with the current
of re- nalsaremorethansimplecorrelations.Third,standardized
California, Department To remedy the lack of reproducibility and concerns of model,wherebymostepidemiologicstudiesusesingledata
Department of Stanford
of Statistics, producibility.
validity, multiple personalA new model
exposures canis berequired discover adjustedanalyses,inwhichadjustmentsareperformedsys
assessed si- tosetswithoutreplicationaswellasnon–time-dependentas-
Medicine, Stanford
University School of
environmental
multaneously in termsexposures associated
of their association with awith
Nutrients
condi-disease while
sessments, tematically
and reported adjustments andarein markedly
the same way across multiple data sets
differ-
Humanities
University School andof and vitamins
Sciences, Stanford, tion or diseasepossibilities
mitigating of interest; the ofstrongest
selective associations
reporting.can ent across reports and may data
alsosets,help.
even those
Thisperformed
is in stark by contrast with the curren
Medicine, Stanford,
California, and then be tentatively independent data sets the same team (different approaches increase validity but
validated inInfectious
California, Department
Meta-Research To remedy
Demographic
the lack of
in references 2 andagents
(eg, as doneattributes
reproducibility and concerns of model,wherebymostepidemiologicstudiesusesingledata
3).2,3 The main advan- must be reconciled and assimilated). Negative correlation Positive cor
of Statistics,
InnovationStanford
Center at
validity,
tages of thismultiple personal
process include exposures
the ability to searchcan be assessed
the list However, setswithoutreplicationaswellasnon–time-dependentas
si- eventually for most environmental cor-
Stanford (METRICS),
University School of
Stanford, California.
Humanities and
Sciences, Stanford,
•data mining and informatics to tackle complexity
ofmultaneously
exposures and adjust in terms of their
for multiplicity association
systematically andwith relates,
Each correlation interdependency globe includes 317 environmental exposures
there maysessments,
a condi- be unsurpassable
report all the probed associations instead of only the most ing potential causal inferences based on observational
tion orrepresented
diseaseby ofthe
interest;
nodes aroundthethestrongest
periphery ofassociations
the globe. Pairwisecan ent across
correlations The size
anddifficulty
reports
reported adjustments are markedly differ
establish-
nodes. Correlations with absolute values exceeding 0.2 are shown (strong
andis proportional
of each node data sets,toeven those
the number performed
of edges for a node,by
a
significant results. The term “environment-wide associa- data alone.
are depicted by edges (lines) between the node of interest (arrowhead) and other Factors that seem protective may some-
thickness of each edge indicates the magnitude of the correlation.
then be tentatively validated in independent databesets tested inthe same team trials.(different approaches increase validity bu
•longitudinal/linkable data & biorepositories
California, and tion studies” (EWAS) has been used to describe this ap- times randomized The complexity of
Meta-Research (eg, as(andone
proach analogy in references
to genome-wide 2 and 3).2,3 stud-
association The main must be
advan-correlations
the multiple reconciled
also highlights theand assimilated).
challenge
Innovation Center at ies). High-throughput
For example, 4 ascertainment of endogenous indicators of en-
Wang et alinclude
screenedthe more than 2000 that the
intervening to modify US federally
1 putative risk funded gene
factor also may expression experiment data be
tages of this process ability to search list However, eventually for most environmental cor
Stanford (METRICS), chemicals vironmental
of exposures
exposure
in serum to discover
and adjust
that may reflect
endogenous
forperformance
multiplicity
exposures the as-
exposome
systematically
increasingly
inadvertently at-multiple
affect
relates,
anda seemingly
ited inother
there
public
may
repositories
correlated JAMA, 2014
758,000 individuals
>400 studies
>>1B datapoints (genotypes and phenotypes)
40K participants
http://nhanes.hms.harvard.edu
VIEWPOINT
VIEWPOINT
Studying
Studying thethe
Opinion Viewpoint Elusive
Elusive Environment
Environment in Large Scale
in Large Scale
It is possible that more than 50% of complex disease risk the EWAS vantage point, intervening on β-carotene
•evaluate new ‘omics technologies
Chirag J. Patel, PhD
ChiragCenter
J. Patel, PhD
for Biomedical isItattributed
is possible
Figure. that more
Correlation
to differences than 50%environment.
inInterdependency
an individual’s ofGlobes
complex disease
for14 Environmental
(Figure, D) risk the
Exposures
seems EWAS
a futile exercisevantage
(Cotinine,
metabolomics
Mercury,
given point,rela-
Cadmium,
its complex intervening
Trans-β-Carotene)oninβ-carotene
National Hea
Informatics, Harvard Air Nutrition
is attributed
pollution, toExamination
smoking, differences Survey
and diet are in an(NHANES)
individual’s
documented Participants,
environ- 2003-2004
environment.
tionship with1 other(Figure,
nutrientsD) andseems a futile exercise given its complex rela
pollutants.
Center for Biomedical
Medical School,
Informatics, Harvard mental factors affecting health, yet these factors are but Giventhiscomplexity,howcanstudiesofenvironmen-
Boston, Massachusetts. Air pollution,
A Serumsmoking,
cotinine and diet are documented B Serum total environ-
mercury tionshipC with Serumother
cadmiumnutrients and pollutants.D Serum trans-β-carotene
Medical School, a fraction of the “exposome,” the totality of the exposure tal risk move forward? First, EWAS analyses should be ap-
mental
load factors
occurring affecting
37 Total correlations
throughout health,
a person’s lifetime.yet1 Investigat-
these 42 factors are but
Total correlations Giventhiscomplexity,howcanstudiesofenvironmen
68 Total correlations
pliedtomultipledatasets,andconsistencycanbeformally
68 Total correlations
Boston, Massachusetts.
John P. A. Ioannidis,
MD, DSc a fraction
ing of the of
one or a handful “exposome,”
exposures at athe
timetotality
Pollutants
has led to exposure tal risk move forward? First, EWAS analyses should be ap
ofatheexaminedforallassessedcorrelations.Second,thetempo-
Stanford Prevention highly fragmentedthroughout
literature of epidemiologic associa- 1 Investigat-
ral relationship between exposure and changes in health
John P. A. Ioannidis, load occurring a person’s lifetime. pliedtomultipledatasets,andconsistencycanbeformally
Research
MD, DSc
Center,
Department of Health
tions. Much of that literature is not reproducible, and se- parameters may offer helpful hints about which of the sig-
ing one or a handful of exposures at a time has led to a examinedforallassessedcorrelations.Second,thetempo what causes what?
Research and Policy, lective reporting may be a major reason for the lack of re- nalsaremorethansimplecorrelations.Third,standardized
Stanford Prevention
Department of highly fragmented
producibility. A new model literature
is required of epidemiologic confounding
associa- ral relationship between exposure and changes in health
to discover adjustedanalyses,inwhichadjustmentsareperformedsys-
Research Center,
Medicine, Stanford
tions. Muchexposures
environmental of that literature
associated with is not reproducible,
disease and se-
while tematically parameters
and in the same way across maymultiple
offer helpful
data sets,hints about which of the sig
Department
UniversityofSchool
Health
of
Medicine, Stanford,
Research and Policy, lective reporting may be a major reason for themay
mitigating possibilities of selective reporting.
lackalso help. This is in stark contrast with the current
of re- nalsaremorethansimplecorrelations.Third,standardized
California, Department To remedy the lack of reproducibility and concerns of model,wherebymostepidemiologicstudiesusesingledata
Department of Stanford
of Statistics, producibility.
validity, multiple personalA new model
exposures canis berequired discover adjustedanalyses,inwhichadjustmentsareperformedsys
assessed si- tosetswithoutreplicationaswellasnon–time-dependentas-
Medicine, Stanford
University School of
environmental
multaneously in termsexposures associated
of their association with awith
Nutrients
condi-disease while
sessments, tematically
and reported adjustments andarein markedly
the same way across multiple data sets
differ-
Humanities
University School andof and vitamins
Sciences, Stanford, tion or diseasepossibilities
mitigating of interest; the ofstrongest
selective associations
reporting.can ent across reports and may data
alsosets,help.
even those
Thisperformed
is in stark by contrast with the curren
Medicine, Stanford,
California, and then be tentatively independent data sets the same team (different approaches increase validity but
validated inInfectious
California, Department
Meta-Research To remedy
Demographic
the lack of
in references 2 andagents
(eg, as doneattributes
reproducibility and concerns of model,wherebymostepidemiologicstudiesusesingledata
3).2,3 The main advan- must be reconciled and assimilated). Negative correlation Positive cor
of Statistics,
InnovationStanford
Center at
validity,
tages of thismultiple personal
process include exposures
the ability to searchcan be assessed
the list However, setswithoutreplicationaswellasnon–time-dependentas
si- eventually for most environmental cor-
Stanford (METRICS),
University School of
Stanford, California.
Humanities and
Sciences, Stanford,
•data mining and informatics to tackle complexity
ofmultaneously
exposures and adjust in terms of their
for multiplicity association
systematically andwith relates,
Each correlation interdependency globe includes 317 environmental exposures
there maysessments,
a condi- be unsurpassable
report all the probed associations instead of only the most ing potential causal inferences based on observational
tion orrepresented
diseaseby ofthe
interest;
nodes aroundthethestrongest
periphery ofassociations
the globe. Pairwisecan ent across
correlations The size
anddifficulty
reports
reported adjustments are markedly differ
establish-
nodes. Correlations with absolute values exceeding 0.2 are shown (strong
andis proportional
of each node data sets,toeven those
the number performed
of edges for a node,by
a
significant results. The term “environment-wide associa- data alone.
are depicted by edges (lines) between the node of interest (arrowhead) and other Factors that seem protective may some-
thickness of each edge indicates the magnitude of the correlation.
then be tentatively validated in independent databesets tested inthe same team trials.(different approaches increase validity bu
•longitudinal/linkable data & biorepositories
California, and tion studies” (EWAS) has been used to describe this ap- times randomized The complexity of
Meta-Research (eg, as(andone
proach analogy in references
to genome-wide 2 and 3).2,3 stud-
association The main must be
advan-correlations
the multiple reconciled
also highlights theand assimilated).
challenge
Innovation Center at ies). High-throughput
For example, 4 ascertainment of endogenous indicators of en-
Wang et alinclude
screenedthe more than 2000 that the
intervening to modify US federally
1 putative risk funded gene
factor also may expression experiment data be
tages of this process ability to search list However, eventually for most environmental cor
Stanford (METRICS), chemicals vironmental
of exposures
exposure
in serum to discover
and adjust
that may reflect
endogenous
forperformance
multiplicity
exposures the as-
exposome
systematically
increasingly
inadvertently at-multiple
affect
relates,
anda seemingly
ited inother
there
public
may
repositories
correlated JAMA, 2014
CHD individuals?
Confounding bias:
mercury high HDL Ice cream and drowning deaths
ρ
β-carotene hydrocarbons
Independence of association:
?
Exposure Disease
Disease Risk
[high]
time
EWAS to search for
NHANES: 1999-2004
National Death Index linked mortality
IJE, 2013
All-cause mortality:
8
1 (11) (69) 1 1 Physical Activity
2 Does anyone smoke in home?
replicated factor 3 Cadmium
4 Cadmium, urine
2 5 Past smoker
6 Current smoker
sociodemographics
7 trans-lycopene
2
6
3
1 age (10 year increment)
3
-log10(pvalue)
4 2 SES_1
3 male
4 4 SES_0
5 5 black
4
FDR < 5% 7 6
6 SES_2
7 SES_3
8 education_hs
9 other_eth
10 mexican
5 6 11 occupation_blue_semi
12 education_less_hs
13 occupation_never
14 occupation_blue_high
15 occupation_white_semi
2
16 other_hispanic
8 7
9
10 11
12
13 14
1516
0
0.4 0.6 0.8 1.0 1.2 1.4 1.6 2.0 2.4 2.8
income (quintile 2)
2
6
4 2 SES_1
3 male
past smoker?5 4 4 SES_0
5 black
4
16 other_hispanic
8 7
9
10 11
12
13 14
1516 R2 ~ 2%
0
0.4 0.6 0.8 1.0 1.2 1.4 1.6 2.0 2.4 2.8
rs7903146 10 114,748,339 T C 0.406 0.293 1.65 6 0.19 2.77 6 0.50 0.28 1.0546 1.5 3 10234 ,1.0 3 1027 3.2 3 10217 ,3.3 3 10210 TCF7L2
rs13266634 8 118,253,964 C C 0.254 0.301 1.18 6 0.25 1.53 6 0.31 0.24 1.0089 6.1 3 1028 5.0 3 1027 2.1 3 1025 1.8 3 1025 SLC30A8
ρ
26
rs1111875 10 94,452,862 G G 0.358 0.402 1.19 6 0.19 1.44 6 0.24 0.19 1.0069 3.0 3 10 7.4 3 1026 9.1 3 10 26
7.3 3 1026 HHEX
26
rs7923837 10 94,471,897 G G 0.335 0.377 1.22 6 0.21 1.45 6 0.25 0.20 1.0065 7.5 3 10 2.2 3 1025 3.4 3 10 26
2.5 3 1026 HHEX
24
rs7480010 11 42,203,294 G A 0.336 0.301 1.14 6 0.13 1.40 6 0.25 0.08 1.0041 1.1 3 10 2.9 3 1024 1.5 3 10 25
1.2 3 1025 LOC387761
Independence of association:
24
2.8 3 1024 25
1.3 3 1025
β-carotene hydrocarbons
rs3740878 11 44,214,378 A A 0.240 0.272 1.26 6 0.29 1.46 6 0.33 0.24 1.0046 1.2 3 10 1.8 3 10 EXT2
24
rs11037909 11 44,212,190 T T 0.240 0.271 1.27 6 0.30 1.47 6 0.33 0.25 1.0045 1.8 3 10 4.5 3 1024 1.8 3 10 25
1.3 3 1025 EXT2
24
rs1113132 11 44,209,979 C C 0.237 0.267 1.15 6 0.27 1.36 6 0.31 0.19 1.0044 3.3 3 10 8.1 3 1024 3.7 3 10 25
2.9 3 1025 EXT2
exposure?
stage 2 genotype counts for the control subjects and assuming a T2DM prevalence of 7% in the French population. hom, homozygous; het, heterozygous; major allele, the allele with the higher
frequency in controls; pMAX, P-value of the MAX statistic from the x2 distribution; pMAX (perm), P-value of the MAX statistic from the permutation-derived empirical distribution (pMAX and
pMAX (perm) are adjusted for variance inflation); risk allele, the allele with higher frequency in cases compared with controls.
γ-tocopherol
Identification of four novel T2DM loci The most significant of these corresponds to rs13266634, a non-
Our fast-track stage 2 genotyping confirmed the reported association synonymous SNP (R325W) in SLC30A8, located in a 33-kb linkage
for rs7903146 (TCF7L2) on chromosome 10, and in addition iden- disequilibrium block on chromosome 8, containing only the 39 end
tified significant associations for seven SNPs representing four new of this gene (Fig. 2a). SLC30A8 encodes a zinc transporter expressed
T2DM loci (Table 1). In all cases, the strongest association for the solely in the secretory vesicles of b-cells and is thus implicated in the
MAX statistic (see Methods) was obtained with the additive model. final stages of insulin biosynthesis, which involve co-crystallization
–log10[P]
–log10[P]
2 2
0 0
SLC30A8 IDE KIF11 HHEX
* * *
*
In GWAS, allows
rs 1 2 2 5 7 0 5 3
rs 1 0 7 8 6 0 4 4
**
one to trace
rs 1 1 1 8 7 0 2 5
rs 1 1 1 8 7 0 6 0
rs 1 1 1 8 7 0 6 4
rs 1 2 2 5 6 4 3 5
rs 1 0 8 8 2 0 8 8
rs 1 0 8 8 2 0 9 1
rs 1 0 5 0 9 6 4 6
rs 1 1 1 8 7 1 7 3
rs 1 1 5 9 2 0 6 7
rs 1 1 1 8 7 1 8 2
rs 1 0 5 0 5 2 9 2
rs 1 1 7 8 1 5 1 9
rs 1 0 5 0 5 2 9 3
rs 1 0 50 5 3 1 4
rs 1 0 5 0 5 3 1 0
rs 1 3 2 6 6 6 3 4
rs 1 0 2 8 2 9 4 0
rs 1 0 5 05 3 0 9
rs 2 2 5 9 0 4 9
rs 2 9 0 1 5 8 7
rs 7 0 8 6 2 8 5
rs 7 9 1 0 9 7 7
rs 1 8 8 7 9 2 2
rs 2 1 4 9 6 3 2
rs 2 4 2 1 9 4 0
rs 3 7 3 7 2 2 5
rs 6 5 8 3 8 2 0
rs 7 0 7 8 4 1 3
rs 1 8 3 2 1 9 7
rs 2 4 2 1 9 4 3
rs 7 9 0 8 1 1 1
rs 1 9 9 9 7 6 3
rs 3 7 5 8 5 0 5
rs 6 5 8 3 8 2 6
rs 3 8 2 4 7 3 5
rs 4 6 0 4 7 9 1
rs 2 2 7 5 2 1 9
rs 7 9 1 4 8 1 4
rs 7 0 7 0 9 9 0
rs 6 5 8 3 8 3 0
rs 7 9 0 2 4 3 6
rs 7 9 1 7 3 5 9
rs 2 2 7 5 7 2 9
rs 1 1 1 1 8 7 5
rs 7 9 2 3 8 3 7
rs 2 4 9 7 3 1 1
rs 2 4 9 7 3 0 4
rs 2 4 8 8 0 7 1
rs 1 5 3 9 3 3 0
rs 9 4 2 0 5 9 2
rs 2 4 9 7 3 5 1
rs 2 4 8 8 0 6 2
rs 1 9 3 5 4 9 2
rs 1 4 1 8 3 8 8
rs 4 2 4 4 9 3 2
rs 2 4 9 0 7 5 1
rs 2 4 2 2 0 6 7
rs 2 4 9 0 7 4 5
rs 29 3 8 8 6 4
rs 3 0 1 9 8 8 0
rs 6 4 6 9 6 6 8
rs 3 0 1 9 8 8 5
rs1 0 0 1 6 4 6
rs 2 0 4 7 9 6 2
rs 7 0 1 1 0 5 7
rs 13 9 4 8 7 4
rs 7 8 3 3 7 3 4
rs 1 5 0 5 5 2 1
rs 2 0 6 2 9 4 7
rs 7 0 0 0 5 0 5
rs 7 8 3 3 7 1 2
rs 1 3 9 4 8 7 5
rs 6 4 6 9 6 7 4
rs 7 8 1 7 7 5 4
rs 6 4 6 9 6 7 5
rs 2 4 6 4 5 9 2
rs 2 4 6 6 2 9 9
rs 2 4 6 6 2 9 5
rs 2 46 6 2 9 3
rs 1 5 7 8 9 7 8
rs 6 4 6 9 6 8 1
rs 2 4 6 6 3 1 8
rs2 4 6 6 3 1 6
rs 1 9 9 5 2 2 2
rs7 0 0 5 1 4 0
rs1 4 9 9 4 3 0
rs 2 6 4 9 1 0 2
rs 1 4 9 9 4 3 3
rs 1 6 2 2 1 0 8
rs1 7 9 3 7 3 3
rs 1 7 9 3 7 3 2
rs 2 4 6 4 5 9 4
rs 5 5 1 2 6 6
rs 9 4 7 5 9 1
rs 9 6 1 6 3 0
rs 8 6 8 6 5 1
rs9 2 4 3 8 8
rs 9 0 4 5 4 4
c d
Sladek
4 et al., Nature Genetics (2007) 4
–log10[P]
–log10[P]
2 2
0 0
EXT2 ALX4 LOC387761
**
* * **
Interdependencies of the exposome:
Correlation globes paint a complex view of exposure
Spearman ρ
“null ρ”
Red: positive ρ
Blue: negative ρ
thickness: |ρ|
JECH 2015
Interdependencies of the exposome:
Correlation globes paint a complex view of exposure
Spearman ρ
“null ρ”
Red: positive ρ
Blue: negative ρ
thickness: |ρ|
Effective number of
variables:
JECH 2015
Estimating the LD of the exposome:
Diabetes vs. death have distinct globes (PoPs vs. smoking?)...
(“hubs”)
C-reactive protein
20
-log10(pvalue)
2-fluorene
4
●
●
●
●●
●
●●
●
3
−log10(pvalue)
● ●
●
●
●
2
● ●
● ●
● ● ●
● ● ● ● ●
● ● ● ● ● ●
●● ● ● ● ● ● ● ●
● ● ● ● ●●
1
● ● ●● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ●● ●●
● ●
● ● ● ● ● ●● ●
● ●●● ● ●
● ● ● ● ●● ● ● ●
● ●●●● ● ●
● ●
● ●
● ● ● ● ● ● ●● ● ●●● ● ● ● ● ●
•
● ● ● ●
●●● ●●● ● ●● ● ● ● ● ●
●● ● ● ● ● ●● ● ● ●
●
● ● ● ●
● ●● ●● ● ● ● ● ●
generalizable, comprehensive,
● ● ●
●● ●●● ● ●● ● ●● ● ● ● ●● ● ●
● ● ●
● ● ●●
●● ● ● ●
● ●
●● ● ●●●
●● ●● ●●
● ● ● ● ● ●
●
●
●
●● ● ● ●● ●
● ● ●
●
nutrients carotenoid
nutrients minerals
nutrients vitamin A
nutrients vitamin B
nutrients vitamin C
nutrients vitamin D
nutrients vitamin E
phytoestrogens
cotinine
hydrocarbons
volatile compounds
allergen test
viral infection
bacterial infection
latex
phenols
phthalates
polybrominated ethers
polyflourochemicals
acrylamide
heavy metals
perchlorate
pcbs
dioxins
pesticides atrazine
pesticides carbamate
pesticides chlorophenol
pesticides organochlorine
pesticides organophosphate
pesticides pyrethyroid
diakyl
furans dibenzofuran
transparent, and systematic study of
environment
HDL-C: 1-10 mg/dL
T2D: ~2-3 OR
Body Measures
Glucose
Height Triglycerides
Systolic BP
Creatinine
Diastolic BP
Sodium
Pulse rate
Uric Acid
VO2 Max Aging
Telomere length
Inflammation
C-reactive protein
Liver function
Gamma glutamyltransferase
EWAS-derived phenotype-exposure association map:
A 2-D view of phenotype-exposure associations for re-
classification
n e
0 e t e
17 at o
C B o l c ar
P f β-
Glucose
BMI
Height
Cholesterol
http://bit.ly.com/pemap
Creation of a phenotype-exposure association map:
A 2-D view of 83 phenotype by 252 exposure associations
252 exposures
Association Size:
83 phenotypes
>0
<0
-
-0.4
phenotypes
-0.2
0
Value
Color Key
and Histogram
0.2
0.4
Alpha-carotene
Alcohol
Vitamin E as alpha-tocopherol
Beta-carotene
+
Caffeine
Calcium
Carbohydrate
Cholesterol
Copper
Beta-cryptoxanthin
Folic acid
Folate, DFE
Food folate
Dietary fiber
Iron
Energy
Lycopene
Lutein + zeaxanthin
MFA 16:1
MFA 18:1
MFA 20:1
Magnesium
Total monounsaturated fatty acids
Moisture
Niacin
PFA 18:2
PFA 18:3
PFA 20:4
PFA 22:5
PFA 22:6
Total polyunsaturated fatty acids
Phosphorus
Potassium
Protein
Retinol
SFA 4:0
SFA 6:0
SFA 8:0
SFA 10:0
SFA 12:0
SFA 14:0
SFA 16:0
SFA 18:0
Selenium
Total saturated fatty acids
Total sugars
Total fat
Theobromine
Vitamin A, RAE
Thiamin
Vitamin B12
Riboflavin
Vitamin B6
Vitamin C
Vitamin K
Zinc
No Salt
Ordinary Salt
a-Carotene
Vitamin B12, serum
trans-b-carotene
cis-b-carotene
b-cryptoxanthin
Folate, serum
g-tocopherol
Iron, Frozen Serum
Combined Lutein/zeaxanthin
trans-lycopene
Folate, RBC
Retinyl palmitate
Retinyl stearate
Retinol
Vitamin D
a-Tocopherol
Daidzein
o-Desmethylangolensin
Equol
Enterodiol
Enterolactone
Genistein
Estimated VO2max
Physical Activity
Does anyone smoke in home?
Total # of cigarettes smoked in home
Cotinine
Current Cigarette Smoker?
Age last smoked cigarettes regularly
# cigarettes smoked per day when quit
# cigarettes smoked per day now
# days smoked cigs during past 30 days
Avg # cigarettes/day during past 30 days
Smoked at least 100 cigarettes in life
Do you now smoke cigarettes...
number of days since quit
Used snuff at least 20 times in life
drink 5 in a day
drink per day
days 5 drinks in year
days drink in year
3-fluorene
2-fluorene
3-phenanthrene
1-phenanthrene
2-phenanthrene
1-pyrene
3-benzo[c] phenanthrene
3-benz[a] anthracene
Mono-n-butyl phthalate
Mono- phthalate
Mono-cyclohexyl phthalate
Mono-ethyl phthalate
Mono- phthalate
Mono--hexyl phthalate
Mono-isobutyl phthalate
Mono-n-methyl phthalate
Mono- phthalate
Mono-benzyl phthalate
Cadmium
Lead
Mercury, total
Barium, urine
Cadmium, urine
Cobalt, urine
Cesium, urine
Mercury, urine
Iodine, urine
Molybdenum, urine
Lead, urine
Platinum, urine
Antimony, urine
Thallium, urine
Tungsten, urine
Uranium, urine
Blood Benzene
Blood Ethylbenzene
Blood o-Xylene
Blood Styrene
Blood Trichloroethene
Blood Toluene
Blood m-/p-Xylene
1,2,3,7,8-pncdd
1,2,3,7,8,9-hxcdd
1,2,3,4,6,7,8-hpcdd
1,2,3,4,6,7,8,9-ocdd
2,3,7,8-tcdd
Beta-hexachlorocyclohexane
Gamma-hexachlorocyclohexane
Hexachlorobenzene
Heptachlor Epoxide
Mirex
Oxychlordane
p,p-DDE
Trans-nonachlor
exposures
2,5-dichlorophenol result
2,4,6-trichlorophenol result
Pentachlorophenol
Dimethylphosphate
Diethylphosphate
Dimethylthiophosphate
PCB66
PCB74
PCB99
PCB105
PCB118
PCB138 & 158
PCB146
PCB153
PCB156
PCB157
PCB167
PCB170
PCB172
PCB177
PCB178
PCB180
PCB183
PCB187
3,3,4,4,5,5-hxcb
3,3,4,4,5-pncb
3,4,4,5-tcb
Perfluoroheptanoic acid
Perfluorohexane sulfonic acid
Perfluorononanoic acid
Perfluorooctanoic acid
Perfluorooctane sulfonic acid
Perfluorooctane sulfonamide
2,3,7,8-tcdf
1,2,3,7,8-pncdf
2,3,4,7,8-pncdf
1,2,3,4,7,8-hxcdf
1,2,3,6,7,8-hxcdf
1,2,3,7,8,9-hxcdf
2,3,4,6,7,8-hxcdf
1,2,3,4,6,7,8-hpcdf
Measles
Toxoplasma
Hepatitis A Antibody
Hepatitis B core antibody
Hepatitis B Surface Antibody
Herpes II
Insulin
MCHC
Ferritin
Weight
Sodium
Albumin
Globulin
Chloride
Uric acid
Total Fat
Trunk Fat
PSA. total
Creatinine
Osmolality
Potassium
60 sec HR
Total BMD
Hematocrit
Head BMD
Phosphorus
Bicarbonate
Hemoglobin
Total protein
Triglycerides
C-peptide: SI
Total calcium
Total bilirubin
mean systolic
60 sec. pulse:
Homocysteine
Albumin, urine
mean diastolic
Protoporphyrin
Glucose, serum
LDL-cholesterol
Triceps Skinfold
Standing Height
Platelet count SI
Glucose, plasma
Total Cholesterol
Body Mass Index
Glycohemoglobin
Monocyte percent
Monocyte number
Basophils number
Upper Leg Length
Recumbent Length
Methylmalonic acid
Eosinophils number
Lumber Pelvis BMD
Lymphocyte percent
Alkaline phosphotase
Mean platelet volume
Transferrin saturation
Trunk Lean excl BMC
http://bit.ly.com/pemap
EWAS-derived phenotype-exposure association map:
Count
0 50 100 150
-
-0.4
phenotypes
-0.2
0
Value
Color Key
and Histogram
0.2
0.4
Alpha-carotene
Alcohol
Vitamin E as alpha-tocopherol
Beta-carotene
+
Caffeine
Calcium
Carbohydrate
Cholesterol
Copper
Beta-cryptoxanthin
Folic acid
Folate, DFE
Food folate
Dietary fiber
Iron
Energy
Lycopene
Lutein + zeaxanthin
MFA 16:1
MFA 18:1
MFA 20:1
Magnesium
Total monounsaturated fatty acids
Moisture
Niacin
PFA 18:2
PFA 18:3
PFA 20:4
PFA 22:5
PFA 22:6
Total polyunsaturated fatty acids
Phosphorus
Potassium
Protein
Retinol
SFA 4:0
BMI, weight,
SFA 6:0
SFA 8:0
SFA 10:0
SFA 12:0
SFA 14:0
SFA 16:0
SFA 18:0
renal function Selenium
Total saturated fatty acids
Total sugars
Total fat
BMD
Theobromine
Vitamin A, RAE
Thiamin
Vitamin B12
Riboflavin
Vitamin B6
Vitamin C
Vitamin K
Zinc
No Salt
Ordinary Salt
a-Carotene
Vitamin B12, serum
trans-b-carotene
cis-b-carotene
b-cryptoxanthin
Folate, serum
g-tocopherol
Iron, Frozen Serum
Combined Lutein/zeaxanthin
trans-lycopene
Folate, RBC
Retinyl palmitate
Retinyl stearate
Retinol
Vitamin D
a-Tocopherol
Daidzein
o-Desmethylangolensin
Equol
Enterodiol
Enterolactone
Genistein
Estimated VO2max
nutrients
Physical Activity
Does anyone smoke in home?
Total # of cigarettes smoked in home
Cotinine
Current Cigarette Smoker?
Age last smoked cigarettes regularly
metabolic
# cigarettes smoked per day when quit
# cigarettes smoked per day now
# days smoked cigs during past 30 days
Avg # cigarettes/day during past 30 days
Smoked at least 100 cigarettes in life
Do you now smoke cigarettes...
number of days since quit
Used snuff at least 20 times in life
drink 5 in a day
drink per day
days 5 drinks in year
days drink in year
3-fluorene
2-fluorene
3-phenanthrene
1-phenanthrene
2-phenanthrene
1-pyrene
3-benzo[c] phenanthrene
3-benz[a] anthracene
Mono-n-butyl phthalate
Mono- phthalate
Mono-cyclohexyl phthalate
Mono-ethyl phthalate
Mono- phthalate
Mono--hexyl phthalate
Mono-isobutyl phthalate
Mono-n-methyl phthalate
blood parameters Mono- phthalate
Mono-benzyl phthalate
Cadmium
Lead
Mercury, total
Barium, urine
Cadmium, urine
Cobalt, urine
Cesium, urine
Mercury, urine
Iodine, urine
Molybdenum, urine
Lead, urine
Platinum, urine
Antimony, urine
Thallium, urine
Tungsten, urine
Uranium, urine
Blood Benzene
Blood Ethylbenzene
Blood o-Xylene
Blood Styrene
Blood Trichloroethene
Blood Toluene
Blood m-/p-Xylene
1,2,3,7,8-pncdd
1,2,3,7,8,9-hxcdd
1,2,3,4,6,7,8-hpcdd
1,2,3,4,6,7,8,9-ocdd
2,3,7,8-tcdd
Beta-hexachlorocyclohexane
Gamma-hexachlorocyclohexane
Hexachlorobenzene
Heptachlor Epoxide
Mirex
Oxychlordane
p,p-DDE
Trans-nonachlor
exposures
2,5-dichlorophenol result
2,4,6-trichlorophenol result
Pentachlorophenol
Dimethylphosphate
Diethylphosphate
Dimethylthiophosphate
PCB66
PCB74
PCB99
PCB105
PCB118
PCB138 & 158
PCB146
PCB153
PCB156
PCB157
PCB167
PCB170
PCB172
PCB177
PCB178
PCB180
PCB183
PCB187
3,3,4,4,5,5-hxcb
3,3,4,4,5-pncb
3,4,4,5-tcb
Perfluoroheptanoic acid
Perfluorohexane sulfonic acid
hydrocarbons
Perfluorononanoic acid
Perfluorooctanoic acid
Perfluorooctane sulfonic acid
Perfluorooctane sulfonamide
2,3,7,8-tcdf
1,2,3,7,8-pncdf
2,3,4,7,8-pncdf
1,2,3,4,7,8-hxcdf
1,2,3,6,7,8-hxcdf
1,2,3,7,8,9-hxcdf
2,3,4,6,7,8-hxcdf
1,2,3,4,6,7,8-hpcdf
Measles
Toxoplasma
Hepatitis A Antibody
Hepatitis B core antibody
Hepatitis B Surface Antibody
Herpes II
Insulin
MCHC
Ferritin
Weight
Sodium
Albumin
Globulin
Chloride
Uric acid
Total Fat
Trunk Fat
PSA. total
Creatinine
Osmolality
Potassium
60 sec HR
Total BMD
Hematocrit
Head BMD
Phosphorus
Bicarbonate
Hemoglobin
Total protein
Triglycerides
C-peptide: SI
Total calcium
Total bilirubin
mean systolic
60 sec. pulse:
Homocysteine
Albumin, urine
mean diastolic
Protoporphyrin
Glucose, serum
LDL-cholesterol
Triceps Skinfold
Standing Height
Platelet count SI
Glucose, plasma
Total Cholesterol
Body Mass Index
Glycohemoglobin
Monocyte percent
Monocyte number
Basophils number
Upper Leg Length
Recumbent Length
Methylmalonic acid
Eosinophils number
Lumber Pelvis BMD
Lymphocyte percent
Alkaline phosphotase
Mean platelet volume
Transferrin saturation
Trunk Lean excl BMC
metabolic
pcbs
A 2-D view of connections between P and E
http://bit.ly.com/pemap
EWAS-derived phenotype-exposure association map:
Toward a phenotype-exposure association map:
(Re)-categorizing phenotypes with E
metabolic:Glucose, serum
metabolic:Glucose, plasma
metabolic:Glycohemoglobin
metabolic:C-peptide: SI
blood pressure:mean diastolic
liver:Alkaline phosphotase
metabolic traits
bone:Bone alkaline phosphotase
blood:Red cell distribution width
blood:Protoporphyrin
blood:Platelet count SI
cancer:PSA. total
blood pressure:60 sec HR
kidney:Albumin, urine
kidney:Sodium
kidney:Chloride
kidney:Osmolality
kidney function
nutrition:Methylmalonic acid
heart:Homocysteine
blood:Mean platelet volume
immunological:Eosinophils number
immunological:Basophils number
immunological:White blood cell count
immunological:Lymphocyte number
immunological:Segmented neutrophils number
immunological:Monocyte number
liver:Globulin
immunological:C-reactive protein
inflammation
blood pressure:mean systolic
body measures:Subscapular Skinfold
body measures:Trunk Fat
body measures:Total Fat
metabolic:Insulin
blood pressure:60 sec. pulse:
liver:Gamma glutamyl transferase
body measures:Thigh Circumference
body measures:Maximal Calf Circumference
body measures:Triceps Skinfold
body measures:Waist Circumference
body measures:Body Mass Index
body measures:Trunk Lean excl BMC
adiposity
body measures:Total Lean excl BMC
blood:Segmented neutrophils percent
body measures:Weight
blood:Red blood cell count
blood:Ferritin
kidney:Creatinine
kidney:Total calcium
kidney:Blood urea nitrogen
kidney:Uric acid
blood:Mean cell volume
blood:Mean cell hemoglobin
kidney:Potassium
blood:Hemoglobin
blood:Hematocrit
blood:TIBC, Frozen Serum
blood:MCHC
heart:Total Cholesterol
heart:LDL-cholesterol
heart:Triglycerides
bone:Lumber Pelvis BMD
bone:Lumber Spine BMD
bone:Total BMD
body measures:Upper Leg Length
body measures:Standing Height
bone:Head BMD
immunological:Monocyte percent
heart:Direct HDL-Cholesterol
liver:Total bilirubin
blood:Transferrin saturation
cancer:PSA, free
cancer:Prostate specific antigen ratio
liver:Lactate dehydrogenase LDH
body measures:Recumbent Length
body measures:Head Circumference
liver:Alanine aminotransferase ALT
liver:Aspartate aminotransferase AST
liver:Total protein
kidney:Phosphorus
immunological:Eosinophils percent
immunological:Lymphocyte percent
immunological:Basophils percent
kidney:Bicarbonate
liver:Albumin
7 6 5 4 3 2 1 0
Distance
Toward a phenotype-exposure association map:
(Re)-categorizing phenotypes with E
metabolic:Glucose, serum
metabolic:Glucose, plasma
metabolic:Glycohemoglobin
metabolic:C-peptide: SI
blood pressure:mean diastolic
liver:Alkaline phosphotase
bone:Bone alkaline phosphotase
blood:Red cell distribution width
blood:Protoporphyrin
blood:Platelet count SI
cancer:PSA. total
blood pressure:60 sec HR
kidney:Albumin, urine
kidney:Sodium
kidney:Chloride
kidney:Osmolality
nutrition:Methylmalonic acid
heart:Homocysteine
blood:Mean platelet volume
immunological:Eosinophils number
immunological:Basophils number
immunological:White blood cell count
immunological:Lymphocyte number
immunological:Segmented neutrophils number
immunological:Monocyte number
liver:Globulin
immunological:C-reactive protein
blood pressure:mean systolic
body measures:Subscapular Skinfold
body measures:Trunk Fat
body measures:Total Fat
metabolic:Insulin
blood pressure:60 sec. pulse:
liver:Gamma glutamyl transferase
body measures:Thigh Circumference
body measures:Maximal Calf Circumference
body measures:Triceps Skinfold
body measures:Waist Circumference
body measures:Body Mass Index
body measures:Trunk Lean excl BMC
body measures:Total Lean excl BMC
blood:Segmented neutrophils percent
body measures:Weight
blood:Red blood cell count
blood:Ferritin
kidney:Creatinine
kidney:Total calcium
kidney:Blood urea nitrogen
kidney:Uric acid
blood:Mean cell volume
blood:Mean cell hemoglobin
kidney:Potassium
blood:Hemoglobin
blood:Hematocrit
blood:TIBC, Frozen Serum
blood:MCHC
heart:Total Cholesterol
heart:LDL-cholesterol
heart:Triglycerides
“bad” cholesterol
bone:Lumber Pelvis BMD
bone:Lumber Spine BMD
bone:Total BMD
body measures:Upper Leg Length
body measures:Standing Height
bone:Head BMD
“good” cholesterol
immunological:Monocyte percent
heart:Direct HDL-Cholesterol
liver:Total bilirubin
blood:Transferrin saturation
cancer:PSA, free
cancer:Prostate specific antigen ratio
liver:Lactate dehydrogenase LDH
body measures:Recumbent Length
body measures:Head Circumference
liver:Alanine aminotransferase ALT
liver:Aspartate aminotransferase AST
liver:Total protein
kidney:Phosphorus
immunological:Eosinophils percent
immunological:Lymphocyte percent
immunological:Basophils percent
kidney:Bicarbonate
liver:Albumin
7 6 5 4 3 2 1 0
Distance
Toward a phenotype-exposure association map:
(Re)-categorizing phenotypes with E
metabolic:Glucose, serum
metabolic:Glucose, plasma
metabolic:Glycohemoglobin
metabolic:C-peptide: SI
blood pressure:mean diastolic
liver:Alkaline phosphotase
bone:Bone alkaline phosphotase
blood:Red cell distribution width
blood:Protoporphyrin
blood:Platelet count SI
cancer:PSA. total
blood pressure:60 sec HR
kidney:Albumin, urine
kidney:Sodium
kidney:Chloride
kidney:Osmolality
nutrition:Methylmalonic acid
heart:Homocysteine
blood:Mean platelet volume
immunological:Eosinophils number
immunological:Basophils number
immunological:White blood cell count
immunological:Lymphocyte number
immunological:Segmented neutrophils number
immunological:Monocyte number
liver:Globulin
immunological:C-reactive protein
blood pressure:mean systolic
body measures:Subscapular Skinfold
body measures:Trunk Fat
body measures:Total Fat
metabolic:Insulin
blood pressure:60 sec. pulse:
liver:Gamma glutamyl transferase
body measures:Thigh Circumference
body measures:Maximal Calf Circumference
body measures:Triceps Skinfold
body measures:Waist Circumference
body measures:Body Mass Index
body measures:Trunk Lean excl BMC
body measures:Total Lean excl BMC
blood:Segmented neutrophils percent
body measures:Weight
blood:Red blood cell count
blood:Ferritin
kidney:Creatinine
kidney:Total calcium
kidney:Blood urea nitrogen
kidney:Uric acid
blood:Mean cell volume
blood:Mean cell hemoglobin
kidney:Potassium
blood:Hemoglobin
blood:Hematocrit
blood:TIBC, Frozen Serum
blood:MCHC
heart:Total Cholesterol
heart:LDL-cholesterol
heart:Triglycerides
bone:Lumber Pelvis BMD
bone:Lumber Spine BMD
bone:Total BMD
body measures:Upper Leg Length
body measures:Standing Height
bone:Head BMD
height + BMD
immunological:Monocyte percent
heart:Direct HDL-Cholesterol
liver:Total bilirubin
blood:Transferrin saturation
cancer:PSA, free
cancer:Prostate specific antigen ratio
liver:Lactate dehydrogenase LDH
body measures:Recumbent Length
body measures:Head Circumference
liver:Alanine aminotransferase ALT
liver:Aspartate aminotransferase AST
liver:Total protein
kidney:Phosphorus
immunological:Eosinophils percent
immunological:Lymphocyte percent
immunological:Basophils percent
kidney:Bicarbonate
liver:Albumin
7 6 5 4 3 2 1 0
Distance
H 2 vs. σ 2
E
σ 2
E?
Basophils percent
PSA. total
Eosinophils percent
Mean platelet volume
PSA, free
Sodium
Basophils number
60 sec HR
Prostate specific antigen ratio
mean diastolic
Head BMD
Monocyte percent
Eosinophils number
Recumbent Length
Segmented neutrophils percent
Lymphocyte percent
mean systolic
Monocyte number
Osmolality
MCHC
Platelet count SI
Lumber Spine BMD
Red blood cell count
Glucose, plasma
Glucose, serum
Potassium
Total BMD
Upper Leg Length
60 sec. pulse:
Alanine aminotransferase ALT
Bicarbonate
Chloride
Globulin
Glycohemoglobin
Lumber Pelvis BMD
1 to 66 exposures identified for 81
Phosphorus
Aspartate aminotransferase AST
TIBC, Frozen Serum
Bone alkaline phosphotase
phenotypes
High-throughput E standards:
P=GxE
...but what about interaction between these factors?
Bladder Cancer
smoke?
er
r
ce
nc
an
ca
c
n-
Analytically complex
no
• How do you select which G and E to test???
Bioinformatics. 2012
C
ta E
di e
n
ta ?
ra icid
io
vi oke
...............................
vi in
in
at
m
m
st
sm
pe
10 genetic variants 1 2 3 4 5 6 7 8 9 10 10 exposures
rs13266634 (SLC30A8) 1
rs7903146 (TCF7L2) 2
.............................................
3
4
5
= 100 possible interactions
6
7
8
9
rs1807292 (PPARγ) 10
C
ta E
di e
n
ta ?
ra icid
io
vi oke
...............................
vi in
in
at
m
m
st
sm
pe
10 genetic variants 1 2 3 4 5 6 7 8 9 10 10 exposures
rs13266634 (SLC30A8) 1
rs7903146 (TCF7L2) 2
.............................................
3
4
5
= 100 possible interactions
6
7
8
9
rs1807292 (PPARγ) 10
● ●●
γ-tocopherol
●
−log10(pvalue)
●● ● ● ●
●● ●
β-carotene
● ●●● ●
● ● ● ●
● ● ●
● ●
●●
●
● ●
● ●
heptachlor
● ● ●
● ● ● ●● ●
●
● ● ● ● ●
● ●
1
● ● ● ● ● ● ●
●
● ● ●
● ● ●● ● ●
●
PCB170
● ● ● ● ● ●●
●
●
●● ●
● ●
● ● ●● ●
● ● ● ●
● ● ●●
●
●● ●
● ● ● ●● ● ● ●
● ● ● ●● ●● ● ● ●
●
● ● ●●
● ● ● ● ●●
●● ● ● ●●● ●
● ●
●
●
● ●● ●●
● ● ● ●● ● ●●
●
●
●
● ● ●●
● ●● ●
●●
●
● ● ●
● ●
●● ●
● ●● ● ●
● ●
● ●● ●● ● ● ●
●● ●
● ●
●●
●
●
●● ● ●● ●
0
volatile compounds
allergen test
viral infection
bacterial infection
latex
phenols
phthalates
polybrominated ethers
polyflourochemicals
acrylamide
heavy metals
perchlorate
pcbs
dioxins
pesticides atrazine
pesticides chlorophenol
pesticides organochlorine
pesticides organophosphate
pesticides pyrethyroid
diakyl
furans dibenzofuran
ARTICLES
rs13266634 (SLC30A8)
Unconditional analysis
30
WTCCC, Nature (2007)
–log10(P)
KCNQ1 (2 signals*: )
CDC123/CAMK1D
CHCHD9 KCNJ11
CDKAL1
20
CDKN2A/2B
SLC30A8
CENTD2
MTNR1B Sladek et al., Nature (2007)
ADAMTS9 IGF2BP2 HMGA2 ZFAND6
TP53INP1
BCL11A PPAR TSPAN8/LGR5 PRC1
WFS1 JAZF1
10 IRS1 ZBED3 HNF1A FTO
THADA HNF1B DUSP9
KLF14
NOTCH2
10
Prototype G-EWAS Methodology
GxE in association to T2D
l e te ne Logistic Regression
o β
-toc s-β-c ans- B17 ptac
18 GWAS loci γ ci tr PC he (age, BMI, sex, race)
rs10923931(NOTCH2)
(2)
rs7903146(TCF7L2) rs13266634 (#) risk alleles
logit(diabetes)
rs13266634(SLC30A8) (1)
rs7901695(TCF7L2)
rs2383208(Unknown) (0)
rs1260326(GCKR)
rs780094(GCKR)
rs2237895(KCNQ1)
rs10811661(Unknown) z(γ-tocopherol)
rs4712523(CDKAL1)
rs4607103(Unknown) Bonferroni Correction False Discovery Rate
rs1111875(Unknown)
Number of Effective Tests1 ≅80
Parametric Bootstrap
rs7578597(THADA)
rs4402960(IGF2BP2) α=0.05/80 = 0.0006 of Null Model2
rs1801282(PPARG)
rs12779790(Unknown) 4.4 17.8
rs8050136(FTO)
rs864745(JAZF1) 1.) Nyholt. AJHG 2004
marginal OR=1.1
OR (95% CI)
rs13266634(SLC30A8)
trans-β-carotene (low(-1SD)) 1.8 [1.3,2.6]
trans-β-carotene (mean) 1.1 [0.79,1.5]
trans-β-carotene (high(+1SD)) 0.65 [0.4,1.1]
p-value:5e-05
N(cases):1702(164)
FDR=2%
rs13266634(SLC30A8)
γ-tocopherol (low(-1SD)) 0.82 [0.52,1.3]
γ-tocopherol (mean) 1.1 [0.87,1.5]
γ-tocopherol (high(+1SD)) 1.6 [1.3,2]
p-value:0.0094
N(cases):2925(274)
FDR=18%
0 0.5 1 1.5 2 2.5
Per risk allele OR
Ps opo lic
Ps opo lic
an op us
H me stic
N iat c
eu ric
m ary
om s gic
In igns
s
pl s
em et tic
N iat c
ro c
c
o m o s g ic
In gns
s
d tal
D ar y
Sy Mu ma ry
ic
In gns
s
Sy Mu ma ry
D ar y
r
D ar y
d tal
d al
Pu cula
Pu cula
h i
gi
r ie
eo iou
h i
eu r i
gi
r ie
ti v
t iv
r ie
t iv
yc iet
yc iet
pt ul og
a
er rina
at abo
at abo
as
an t
N ctio
an le
a n le
lo
m scu tolo
lo
o
s kele
D u r in
on
D u r in
la
on
es
on
es
es
si
ju
ju
si
ju
ct
m sc tol
s ke
m c ol
s ke
ro
s
s
m
ig
ig
em t
u
fe
fe
lm
ig
lm
va
va
a
om os
ito
ito
ito
e
In
In
m
pt lo
io
io
pt ul
en
en
en
er
er
rd
d
d
rd
an
D
s
G
G
G
Ca
Ca
H
Sy u
e
e
M
in
in
cr
cr
do
do
En
RA GWAS SNP
En
c MI GWAS SNP
rat
os
is 16 rs4977574 (CDKN2BAS) d 11 rs660895 (HLA-DRB1)
cid
os
is
e 15 10 toa
ei ck os
is ke
rrh ler tic
bo 14 ro sc iab
e
Se the 9 1d
a e
13 ary Typ
or four SNPs. Each panel represents 1,358 phenotypes oron e ete
s
ar t
hriti
s
12 e
C
om 8 iab m.
h a particular SNP, 11
using logistic regression is
t d ary
eassuming
as yndr
s an Typ
e1
d
ro pa
thy
R he
u
a r u
s a n 7 ne thy
djusted
l mu
co for age, sex,
10 study site and the e mfirst
ic e co three principal
he
t
ro
n iab
e tic
hro pa
r a h ia tio 1d ep
fo Isc rmed
–log10(P)
ac sn
are grouped along 9 the x axis by categorization e f r within ies 6
Typ
e te
–log10(P)
Int in ter be
d i al a s e a r dia itis
ca
r
−6 ise ral 1 tes es ar ter
chy. The upper red8 lines indicate P = M4.6 yo × 10 he
ar t d (FDR
rec
e reb = 0.1 5 Typ
e
in
dia
b e it
ter el
i d
Ar ant c fectio ts
l us ia
7 ic fp hy s i mon
r blue lines indicate P = 0.05; dashed be ath lines
tes y
isc
h e m
are os a
i s o
4
a
rop ab
eu 2 di
t ete G , i n
itis de pn s
tiv field iral olyp
c
fe e u
6 dia hrop n ic toris sten ly n nc V al p k
orrection (P = 0.05/1,358). Diamonds n
y i ep Chroencircling ec &
phenotype Po Typ
e nju ual s oc
5 ath etic n a p sion Co Vis Na Sh
u o p
r iab n g in
c l u s 3
c id
NHGRI Catalog associations.
4 Po ype (a) PheWAS massociations
e
lyn 2
d A O
orr
ho for
T He 2
eviously associated 3 with hair and eye color, freckling and
2 1
palsy. (b) PheWAS associations for rs2853676 in TERT,
1
h glioma. (c) PheWAS 0
associations for rs4977574 near 0
In gns
an op us
ro c
e
H m stic
s
Ps opo lic
N iatr c
om os gic
D ary
Sy Mu ma ry
lm ar
d tal
i
gi
rie
h ti
tiv
at abo
er rina
yc ie
N ctio
Pu cul
an le
lo
o
Ps opo lic
om s gic
In igns
s
pl s
H me stic
N iat c
ro c
c
om os gic
In gns
s
la
on
es
r
Sy Mu ma ry
D ar y
si
ju
d tal
d al
m sc tol
Pu cula
rie
eo iou
h i
eu r i
gi
rie
s ke
tiv
s
a
at abo
an t
ig
fe
em et
u
an le
m scu olo
lo
o
s kele
va
eu
D urin
a
on
es
si
ju
ju
ito
s
ct
m sc tol
e
In
s ke
s
io
ig
em t
pt ul
fe
lm
va
en
d
rd
ito
D
io
G
N
Ca
pt ul
en
er
rd
d
e
an
G
Ca
rin
tudy are available at http://phewascatalog.org/.
e
c
in
do
cr
En
do
En
d 11
PheWAS:
rs660895 (HLA-DRB1)
is
10−12),acute myocardial infarction (OR = os
Our study replicated the association between rheumatoid arthritis
cid
Phenome-wide
= 1.29, and rs660895association study
10 etoa
ck
eti
d abdominal aortic aneurysm (OR 9 near HLA-DRB1 (Fig.
e 1 dia
b
3d; OR = 1.56, P = 6.7 × 10−8).
Typ
Denny
with prior publications3, but also with other et Nature
al,SNP was also Biotech 2013
s
This8 strongly associated
di abe
tes
with type 1 diabetes (OR =
y u m.a
r th
riti
e1 ath he
ular” phenotypes7 such as unstable angina, −8 Typ
eu
ro p
y 1.44,
R
P = 7.1 × 10 ) and potentially associated with inflammatory
i cn ath
et rop
dia
b
ne
p h
−5
)
www.nature.com/psp
p_full
a
MWAS
1.0E-001
1.0E-002
1.0E-003
1.0E-004
1.0E-005
1.0E-006
1.0E-007
1.0E-008
1.0E-009
1.0E-010
1.0E-011
1.0E-012
ANTIDIARRHEALS,INTES... Sulfasalazine
ANTIEMETICS AND ANTI... Tetrahydrocannabinol
ALIMEN DRUGS FOR ACID RELA... Sucralfate
TARY
TRACT DRUGS FOR FUNCTIONAL Dicyclomine
AND GASTROINTESTINAL DIS... Hyoscyamine
METAB
DRUGS USED IN Acarbose
OLISM
DIABETES
Color by
Sitagliptin
LAXATIVES Lactulose
ANTINF Clindamycin
ECTIVE ANTIBACTERIALS FOR
S FOR Methenamine
SYSTEMIC USE
atc1_concept_name
SYSTE Penicillin V
MIC ANTIMYCOTICS FOR
AND REPELLENTS
SYSTEMIC USE Ketoconazole
USE
ANTIPA ANTIVIRALS FOR SYSTE... Nevirapine
RASITIC ANTHELMINTICS Mebendazole
PRODU
ANTIPROTOZOALS Tinidazole
CARDIOVASCULAR SYSTEM
CTS, I...
ANTIANEMIC Darbepoetin alfa
BLOOD
AND PREPARATIONS Epoetin Alfa
BLOO... ANTITHROMBOTIC AGE... Dipyridamole
NULL
SYSTE
M AND UROLOGICALS Flavoxate
SEX
HORMO Oxybutynin
NES Etodolac
Fenoprofen
NERVOUS SYSTEM
Indomethacine
ANTIINFLAMMATORY AND
DERMATOLOGICALS
ANTIRHEUMATIC Ketorolac
PRODUCTS Nabumetone
MUSCU
LO- Oxaprozin
SKELET
MUSCULO-SKELETAL SYSTEM
Sulindac
AL
SYSTE Metaxalone
M MUSCLE RELAXANTS
Methocarbamol
Flurbiprofen
TOPICAL PRODUCTS FOR Ketoprofen
JOINT AND MUSCULAR
Piroxicam
PAIN
Tolmetin
Diflunisal
Eletriptan
Frovatriptan
ANALGESICS Naratriptan
OMOP acute myocardial infarction 1
Rizatriptan
Salicylsalicylic acid
Sumatriptan
atc1_concept_name, atc3_concept_name, rxnorm_concept_name
Zolmitriptan
NERVO
US ANESTHETICS Prilocaine
SYSTE ANTIEPILEPTICS Primidone
M
ANTI-PARKINSON DRUGS Bromocriptine
SENSORY ORGANS Desipramine
PHYCHOANALEPTICS Imipramine
RESPIRATORY SYSTEM
Nortriptyline
Chlorazepate
Droperidol
PSYCHOLEPTICS Prochlorperazine
Ramelteon
SYSTEMIC HORMONAL PREPARATIONS,
Temazepam
Amylases
EXCLUDING SEX HORMONES AND INSULINS
Endopeptidases
NULL NULL
Lipase
Sodium phosphate, monobasic
ANTIHISTAMINES FOR
SYSTEMIC USE Loratadine
RESPIR COUGH AND COLD PRE... Benzonatate
ATORY
SYSTE DRUGS FOR Salmeterol
M OBSTRUCTIVE AIRWAY... Zafirlukast
NASAL PREPARATIONS Fluticasone
Acetazolamide
Bromfenac
SENSO
RY OPHTHALMOLOGICALS Gatifloxacin
ORGAN Ketotifen
S
Scopolamine
MWAS:
Horizontal line:
Horizontal line:
GROUND_TRUTH
Bonferroni adjustment: P
3
In conclusion:
on GWAS and EWAS
●
● ●
●
●●
●
2
●
−log10(pvalue)
●● ● ● ●
●● ●
● ●●● ●
● ● ● ●
● ● ●
● ●
●●
●
● ●
● ● ● ● ●
● ● ● ●● ●
●
● ● ● ● ●
● ●
1
● ● ● ● ● ● ●
●
● ● ●
● ● ●● ● ●
● ● ● ● ●
●
● ●
● ● ● ●●
●● ●● ● ●
● ●● ●
● ● ● ●●
● ● ● ● ●●
●● ●
● ● ● ●●● ●● ●● ●
●
● ● ● ●
●
● ● ● ●●● ● ● ●●
●● ● ● ●
● ●
●
●
● ●● ●●
● ● ● ●● ● ●●
●
●
●
● ● ●●
● ●● ●
●●
●
● ● ●
● ●
●● ●
● ● ●
● ●● ●● ● ● ●
● ●
● ●
●● ●● ●
●● ● ● ● ●
●
●● ●
nutrients carotenoid
nutrients minerals
nutrients vitamin A
nutrients vitamin B
nutrients vitamin C
nutrients vitamin D
nutrients vitamin E
phytoestrogens
cotinine
hydrocarbons
volatile compounds
allergen test
viral infection
bacterial infection
latex
phenols
phthalates
polybrominated ethers
polyflourochemicals
acrylamide
heavy metals
perchlorate
pcbs
dioxins
pesticides atrazine
pesticides chlorophenol
pesticides organochlorine
pesticides organophosphate
pesticides pyrethyroid
diakyl
furans dibenzofuran
Figure 1. GWAS Discoveries over Time
Data obtained from the Published GWAS Catalog (see Web
Resources). Only the top SNPs representing loci with association
p values < 5 3 10!8 are included, and so that multiple counting
is avoided, SNPs identified for the same traits with LD r2 > 0.8 esti-
mated from the entire HapMap samples are excluded.
medicine.
investigated by GWAS, multiple identified loci have
genome-wide statistical significance, and thus it is likely
Figure 2. Increase in Number of Loci Identified as a Function of
Experimental Sample Size
that there are (many) other loci that have not been identi- (A) Selected quantitative traits.
(B) Selected diseases.
fied because of a lack of statistical significance (false nega-
The coordinates are on the log scale. The complex traits were
tives). Recently, researchers have developed and applied selected with the criteria that there were at least three GWAS
methods to quantify the proportion of phenotypic varia- papers published on each in journals with a 2010–2011 journal
Thanks...
Chirag Lakhani Harvard HMS Stanford CDC/NCHS
Adam Brown Isaac Kohane
John Ioannidis
Ajay Yesupriya
Jenn Grandfield
Jian Yang
Paul Elliott
Sunny Alvear
Peter Visscher
Michal Preminger
Lund (Sweden)
Cochrane Jan Sundquist
Francesca Dominici
Chirag J Patel
chirag@hms.harvard.edu
@chiragjp
www.chiragjpgroup.org