Sie sind auf Seite 1von 223

Bioinformatics Approaches for the

Post-GWAS Analysis of Disease


Susceptibility Loci

A thesis submitted to The University of Manchester for the degree of


Master of Philosophy in the Faculty of Medical and Human Sciences

2011

Paul Martin

School of Medicine
Table of Contents
List of Figures 5
List of Tables 7
List of Supplementary Figures 8
List of Supplementary Tables 9
List of Abbreviations 11
Abstract 13
Declaration 14
Copyright Statement 14
Acknowledgements 15
1 Introduction 16
1.1 The International HapMap Project 17
1.2 Linkage Disequilibrium (LD) 18
1.3 Genome-Wide Association Studies (GWAS) 19
1.4 GWAS: Past, Present and Future 21
1.4.1 Affymetrix® GeneChip® 21
1.4.2 Illumina® BeadArray™ 22
1.5 GWAS Limitations 26
1.5.1 Common vs. Rare Variants Hypotheses 27
1.5.2 False-Positive Associations 27
1.5.3 Replication 28
1.5.4 Identification of Causal Variant 28
1.5.5 Effect Size 28
1.6 GWAS & post-GWAS Analysis Methods 29
1.6.1 Traditional Analysis 29
1.6.2 Epistasis Analysis 30
1.6.3 Molecular Pathway & Network Based Analysis 30
1.6.3.1 Assign SNPs to Genes 31
1.6.3.2 Give the Genes an Association Score 31
1.6.3.3 Map Genes onto a Pathway 32
1.6.3.4 Summary 32
1.6.4 Text Mining 38
1.6.5 Conclusions 39
1.7 Summary/Conclusion 40
2 Specific Aims 42
3 Methods 43
3.1 Review of Available Resources 43
3.1.1 The Taverna Workbench 43
3.1.2 Genome Resources 43
3.1.2.1 Ensembl 43
3.1.2.2 The UCSC Genome Browser 44
3.1.2.3 NCBI Entrez 44
3.1.2.4 Genome Resources Summary 45
3.1.3 Genome Recombination 45
3.1.3.1 LD Data 45
3.1.3.2 Recombination Hotspots 45
3.1.4 Pathway Databases and Analysis Methods 46
3.1.4.1 The KEGG Database 46
3.1.4.2 Biocarta 46
3.1.4.3 Gene Ontology (GO) 46
3.1.4.4 PANTHER 47

2
3.1.4.5 DAVID 47
3.1.4.6 InnateDB 48
3.1.4.7 Reactome 48
3.1.5 Protein-protein Interaction 49
3.2 Taverna Workflow 49
3.2.1 Main Workflow Overview 49
3.2.1.1 Parameter Processing 54
3.2.1.2 SNP Info & Region 55
3.2.1.3 LD Region 57
3.2.1.4 Associated Region 57
3.2.1.5 Associated Genes 58
3.2.1.6 Merge SNP Results 59
3.2.2 Additional Resources 59
3.2.2.1 Workflow Utility Services 59
3.2.2.2 Additional Utility Workflows 60
3.3 Disease Analysis – RA 62
3.3.1 Pathway Analysis 62
3.3.1.1 Introduction 62
3.3.1.2 Gene Ontology – BiNGO 64
3.3.1.3 DAVID 64
3.3.1.4 InnateDB 65
3.3.1.5 PANTHER 66
3.3.1.6 Reactome 66
3.3.2 Protein-protein Interaction 66
3.3.3 Comparison with Previous Publications 67
4 Results 68
4.1 Taverna Workflow SNP Gene Mapping 68
4.2 Disease Analysis – RA 73
4.2.1 Pathway Analysis 73
4.2.1.1 Gene Ontology – BiNGO 73
4.2.1.2 DAVID 76
4.2.1.3 InnateDB 79
4.2.1.4 PANTHER 82
4.2.1.5 Reactome 85
4.2.2 Protein-protein Interaction 86
4.2.3 Comparison with Previous Publications 91
5 Discussion 93
5.1 Taverna Workflow 93
5.1.1 Workflow Limitations 95
5.1.2 Workflow Summary 97
5.2 Disease Analysis – RA 97
5.2.1 Pathway Analysis 97
5.2.2 Protein-protein Interaction 102
5.2.3 Comparison with Previous Publications 109
5.3 Limitations 110
5.4 Further Work 111
5.5 Summary 112
6 References 114
7 Supplementary Results 124
7.1 Disease Analysis – RA 124
7.1.1 Pathway Analysis 124

3
7.1.1.1 Gene Ontology – BiNGO 124
7.1.1.2 DAVID 162
7.1.1.3 InnateDB 165
7.1.1.4 PANTHER 179
7.1.1.5 Reactome 192
7.1.2 Protein-protein Interaction 207
8 Supplementary CD 223

Word Count: 37,313

4
List of Figures
Figure 1: Genomic Coverage Resulting from Selecting Maximal Efficiency SNPs 19
Figure 2: Affymetrix® Genome-Wide Human SNP Assay 5.0/6.0 20
Figure 3: Illumina® Infinium™ II Assay 21
Figure 4: Photolithography Based Array Construction 22
Figure 5: Illumina® Array Formats 23
Figure 6: Illumina® BeadArray Decoding Process 23
Figure 7: Published Genome-Wide Associations 25
Figure 8: Illumina® Whole-Genome Genotyping Product Roadmap 26
Figure 9: Distribution of Odds Ratios for Common and Rare Variants 29
Figure 10: SNP to Gene Mapping Used by Raychaudhuri et al. 38
Figure 11: Overview of the Taverna Workflow 49
Figure 12: Parameters XML Document Schema 50
Figure 13a: Taverna Workflow Output XML Document Schema Overview 51
Figure 13b: Expanded SnpInfo XML Element Schema 51
Figure 13c: Expanded SnpRegion XML Element Schema 51
Figure 13d: Expanded LDRegion XML Element Schema 52
Figure 13e: Expanded HotspotRegion XML Element Schema 52
Figure 13f: Expanded AssociatedRegion XML Element Schema 53
Figure 14: Taverna Workflow Diagram to Define the SNP ‘AssociatedRegion’ 54
Figure 15: XML Document Excerpt Showing the Basic ‘AssociatedSNPs’ XML element 54
Figure 16: Nested Taverna Workflow to Retrieve SNPs from NCBI 56
Figure 17: Interim XML Document Schema 56
Figure 18: Nested Taverna Workflow to Define the ‘LDRegion’ 57
Figure 19: Nested Taverna Workflow to Define the ‘HotspotRegion’ and
‘AssociatedRegion’ 58
Figure 20: Nested Taverna Workflow to Rretrieve All Genes within the Defined
‘AssociatedRegion’ 58
Figure 21: Make ‘AssociatedSNPs’ XML Workflow 61
Figure 22: Example BiNGO Network Map Visualised in Cytoscape 65
Figure 23: Example BiNGO Text Output as Viewed from Cytoscape 65
Figure 24: Example Reactome Output 67
Figure 25: Genomic Region Containing rs3890745 68
Figure 26: Distribution of GO Terms in the BiNGO Analyses 75
Figure 27: PANTHER Cellular Component Analysis Graph 83
Figure 28: PANTHER Protein Class Analysis Graph 84
Figure 29: PANTHER Pathway Analysis Graph 85
Figure 30: Protein-protein Interaction Network Produced from the Confirmed Gene
List 86
Figure 31: Protein-protein Interaction Network Produced from the Expanded Gene
List 88
Figure 32: CD28 Network Hub from the Extended Protein-protein Interaction Network
Produced form the Confirmed Gene List 90
Figure 33: PTPN11 Network Hub from the Extended Protein-protein Interaction
Network Produced form the Expanded Gene List 91
Figure 34: TNAFAIP3-OLIG3 Region 93
Figure 35: The Defined Associated Region for rs3184504 94
Figure 36: T cell Priming Pathway 99
Figure 37: WTCCC GWAS Results for RA 101
Figure 38: The HLA Genetic Complex 103

5
Figure 39: Mechanisms Effecting Macrophage Activation, Inflammation, T cell
Development, T cell Proliferation and B cell Activation 106

6
List of Tables
Table 1: HapMap Sample Details 18
Table 2: Comparison of Mapping Methods for Selected Pathway Analysis Studies 31
Table 3: Comparison of Significant Genes Found by Selected Pathway Analysis Studies 34
Table 4: Comparison of Significant Pathways Found by Selected Pathway Analysis
Studies 35
Table 5: The Kyoto Encyclopaedia of Genes and Genomes (KEGG) Entry Points 46
Table 6: Additional Taverna Workflow Parameters 50
Table 7: Available Genome Assemblies for Each of the Four Main Genome Resources 60
Table 8: Five RA Publications from 2008 to 2010 Used to Create Associated SNP List 62
Table 9: Associated RA SNPs 63
Table 10: Files Produced From the Taverna Workflow Output 64
Table 11: Comparison of Genes Identified by the Taverna Workflow with the Most
‘Biologically Plausible’ Gene(s) 70
Table 12: Number of Genes Identified by the Taverna Workflow 72
Table 13: Gene Numbers by Gene Biotype 72
Table 14: Gene Numbers by Gene Status 72
Table 15: Gene Numbers by Associated Gene Database 72
Table 16: Number of Genes Without GO Annotations 73
Table 17: Most Significant BiNGO Results for the Confirmed Gene List 73
Table 18: Most Significant BiNGO Results fot the Expanded Gene List 74
Table 19: GO Terms Showing Significance Across All Gene Lists 76
Table 20: Summary of Genes Mapped by DAVID 77
Table 21: DAVID Functional Classification Results of the Confirmed Gene List 77
Table 22: DAVID Functional Classification Results of the Expanded Gene List 78
Table 23: InnateDB Pathway Over-representation Analysis Summary 80
Table 24: Significant Pathways Identified by the InnateDB Pathway Analysis of the
Confirmed Gene List 80
Table 25: Significant Pathways Identified by the InnateDB Pathway Analysis of the
Expanded Gene List 81
Table 26: InnateDB TFBS Over-representation Analysis Summary 82
Table 27: Summary of IDs Mapped by PANTHER 82
Table 28: PANTHER Cellular Component Analysis Summary 82
Table 29: PANTHER Protein Class Analysis Summary 83
Table 30: PANTHER Pathway Analysis Summary 84
Table 31: Summary of IDs Mapped by Reactome 85
Table 32: Significant Reactome Events 86
Table 33: Protein-protein Interaction Summary 87
Table 34: Extended Protein-protein Intercation Summary 89
Table 35: Hub Analysis of the Extended Protein-protein Interaction Network Produced
from the Confirmed Gene List 89
Table 36: Hub Analysis of the Extended Protein-protein Interaction Network Produced
from the Expanded Gene List 89
Table 37: Comparison of the Genes Implicated by the Taverna Workflow with
Previously Published Findings 92

7
List of Supplementary Figures
Supplementary Figure 1a-c: BiNGO Network Produced Using the Confirmed Gene List 148
Supplementary Figure 2a-c: BiNGO Network Produced Using the Confirmed Gene List
after Exclusion of the MHC Region Genes 151
Supplementary Figure 3a-b: BiNGO Network Produced Using the Expanded Gene List 154
Supplementary Figure 4a-c: BiNGO Network Produced Using the Expanded Gene List
after Exclusion of the MHC Region Genes 156
Supplementary Figure 5: PANTHER Cellular Component Analysis Graph Full Results 190
Supplementary Figure 6: PANTHER Protein Class Analysis Graph Full Results 190
Supplementary Figure 7: PANTHER Pathway Analysis Graph Full Results 191
Supplementary Figure 8: Protein-protein Interaction Network Produced from the
Confirmed Gene List after Exclusion of the MHC Region Genes 207
Supplementary Figure 9: Protein-protein Interaction Network Produced from the
Expanded Gene List after Exclusion of the MHC Region Genes 207
Supplementary Figure 10: Extended Protein-protein Interaction Network Produced
from the Confirmed Gene List 209
Supplementary Figure 11: Extended Protein-protein Interaction Network Produced
from the Confirmed Gene List after Exclusion of the MHC Region Genes 210
Supplementary Figure 12: Extended Protein-protein Interaction Network Produced
from the Expanded Gene List 211
Supplementary Figure 13: Extended Protein-protein Interaction Network Produced
from the Confirmed Gene List after Exclusion of the MHC Region Genes 212

8
List of Supplementary Tables
Supplementary Table 1: Genes Unrecognised by BiNGO 124
Supplementary Table 2: Number of Genes without GO Annotations 127
Supplementary Table 3: Significant BiNGO Results for the Confirmed Gene List 129
Supplementary Table 4: Significant BiNGO Results for the Confirmed Gene List after
Exclusion of the MHC Region Genes 132
Supplementary Table 5: Significant BiNGO Results for the Expanded Gene List 134
Supplementary Table 6: Significant BiNGO Results for the Expanded Gene List after
Exclusion of the MHC Region Genes 141
Supplementary Table 7: GO Terms Showing Significance by Category 159
Supplementary Table 8: DAVID Functional Classification Results of the Confirmed
Gene List 162
Supplementary Table 9: DAVID Functional Classification Results of the Confirmed
Gene List after Exclusion of the MHC Region Genes 162
Supplementary Table 10: DAVID Functional Classification Results of the Expanded
Gene List 163
Supplementary Table 11: DAVID Functional Classification Results of the Expanded
Gene List after Exclusion of the MHC Region Genes 164
Supplementary Table 12: InnateDB Gene Summary 166
Supplementary Table 13: Ensembl Gene IDs Unrecognised by InnateDB 166
Supplementary Table 14: Ensembl Gene IDs with No InnateDB Pathways 168
Supplementary Table 15: Significant Pathways Identified by the InnateDB Analysis of
the Confirmed Gene List 174
Supplementary Table 16: Significant Pathways Identified by the InnateDB Pathway
Analysis of the Confirmed Gene List after Exclusion of MHC Region Genes 175
Supplementary Table 17: Significant Pathways Identified by the InnateDB Pathway
Analysis of the Expanded Gene List 176
Supplementary Table 18: Significant Pathways Identified by the InnateDB Pathway
Analysis of the Expanded Gene List after Exclusion of MHC Region Genes 177
Supplementary Table 19: PANTHER Cellular Component Analysis Full Results 180
Supplementary Table 20: PANTHER Protein Class Analysis Full Results 181
Supplementary Table 21: PANTHER Pathway Analysis Full Results 186
Supplementary Table 22: Ensembl Gene IDs Unrecognised by Reactome 192
Supplementary Table 23: Full Significant Reactome Events Using the Confirmed Gene
List 198
Supplementary Table 24: Full Significant Reactome Events Using the Confirmed Gene
List after Exclusion of the MHC Region Genes 199
Supplementary Table 25: Full Significant Reactome Events Using the Expanded Gene
List 201
Supplementary Table 26: Full Significant Reactome Events Using the Expanded Gene
List after Exclusion of the MHC Region Genes 203
Supplementary Table 27: Full Overlap of Reactome Events 205
Supplementary Table 28: Full Hub Anlaysis of the Extended Protein-protein
Interaction Network Produced from the Confirmed Gene List 213
Supplementary Table 29: Full Hub Anlaysis of the Extended Protein-protein
Interaction Network Produced from the Confirmed Gene List after Exclusion of
the MHC Region Genes 214
Supplementary Table 30: Full Hub Anlaysis of the Extended Protein-protein
Interaction Network Produced from the Expanded Gene List 215

9
Supplementary Table 31: Full Hub Anlaysis of the Extended Protein-protein
Interaction Network Produced from the Confirmed Gene List after Exclusion of
the MHC Region Genes 219

10
List of Abbreviations
1958BC 1958 birth cohort
APC antigen presenting cell
API application programming interface
AVP arginine vasopressin
BD bipolar disorder
BLK B lymphoid tyrosine kinase
CAD coronary artery disease
CAM cell adhesion molecule
CCP cyclic citrullinated peptide
CDCV common disease, common variant
CDRV common disease, rare variant
CNV copy number variation
DAG directed acyclic graph
DAVID Database for Annotation, Visualization and Integrated Discovery
DB database
DNA deoxyribonucleic acid
DT decorrelation test
ENCODE ENCyclopaedia Of DNA Elements
ENSG Ensembl Gene ID
FDR false discovery rate
GO gene ontology
GRAIL Gene Relationships Across Implicated Loci
GSEA gene set enrichment analysis
GUI graphical user interface
GWAS Genome Wide Association Study
GWASPA Genome Wide Association Study Pathway Analysis
HGNC HUGO Gene Nomenclature Committee
HLA human leukocyte antigen
HPRD Human Protein Reference Database
HT hypertension
IBD inflammatory bowel disease
Ig Immunoglobulin
IL interleukin
KB kilobase
KEGG Kyoto Encyclopaedia of Genes and Genomes
LCT linear combination test
LD Linkage Disequilibrium
lincRNA long non-coding ribonucleic acid
MAF minor allele frequency
MAP Mitogen-activated Protein
MHC major histocompatibility complex
miRNA micro ribonucleic acid
MMDB Molecular Modeling Database
MS multiple sclerosis
NBS UK blood donors
NCBI National Center for Biotechnology Information
OMIM Online Medelian Inheritance in Man
OR odds ratio
PANTHER Protein ANalysis THrough Evolutionary Relationships
PPI Protein-protein Interaction

11
QT quadratic test
RA rheumatoid arthritis
RNA ribonucleic acid
rRNA ribosomal ribonucleic acid
SAM Sentrix Array Matrix
scRNA small cytoplasmic ribonucleic acid
SLE systemic lupus erythematosus
snoRNA small nucleolar ribonucleic acid
SNP Single Nucleotide Polymorphism
snRNA small nuclear ribonucleic acid
SOAP Simple Object Access Protocol
SQL Structured Query Language
T1D type 1 diabetes
T2D type 2 diabetes
TF transcription factor
TFBS transcription factor binding site
UCSC University of California Santa Cruz
URL Uniform Resource Locator
WTCCC Wellcome Trust Case Control Consortium
XML Extensible Markup Language

12
Abstract
Introduction: Genome Wide Association Studies (GWAS) have been used extensively to
identify common variations associated with disease and have enormous potential to
identify key pathways responsible for disease pathogenesis. This will in turn lead to insight
into common and disease specific processes and could determine why there is a difference
in disease pathogenesis and treatment response between patient sub groups. Since this
field is still in its infancy and has no clear validated workflow, the aim of the project was to
produce a post-GWAS workflow which can be applied to known associations to implicate
novel pathways and genes. This involved assessing the bioinformatic pathway tools
available to design an automated workflow which would select candidate genes in regions
and test its effective using rheumatoid arthritis (RA) as a model disease.

Methods: Using the Taverna workbench, a robust workflow has been developed to define
a region represented by an associated SNP by utilising existing knowledge of the region
such as linkage disequilibrium (LD) and recombination hotspots. Using RA as a model, the
workflow was used to identify the full extent of the associated regions and identify all the
genes implicated in these regions. Pathway enrichment and protein-protein interaction
analyses were performed to identify potential pathways or interactions associated with the
pathogenesis of the disease.

Results: Of the 58 SNPs associated with RA, the workflow successfully defined associated
regions for 55 SNPs and identified a total of 436 genes representing 54 associated loci. All
regions identified by the workflow contained the most biologically plausible genes with the
exception of five SNPs. The pathway enrichment analyses identified many immunological
pathways including antigen processing and presentation, immune regulation and signalling.
Protein-protein interaction analyses identified genes acting as hubs of interaction
implicating many additional genes.

Discussion: The Taverna workflow provides researchers with a simple, unbiased and
robust tool to assign genes to SNP association signals. Although the workflow identified
many of the genes as those originally assigned by researchers, it also identified potential
interesting candidates, such as PTPN11, in an unbiased manner. Additionally there is now
evidence which implicates new loci, such as the IL6ST and ICOS genes.

The pathway analyses highlighted multiple pathways which confirm the involvement of
existing loci and explain many aspects of RA aetiology. Pathway and protein-protein
interaction analyses emphasise the importance of many molecules central to the immune
system and may well therefore be involved in disease.

It is apparent that no one pathway database is the ideal source and results must be
combined to produce an accurate picture of the pathways involved in the disease.
Additionally, while further refinement and validation is necessary, this approach has
identified novel pathways and implicated additional genes which may contribute to RA
susceptibility or provide therapeutical targets.

13
Declaration
No portion of the work referred to in this thesis has been submitted in support of an
application for another degree or qualification of this or any other university or other
institute of learning.

Copyright Statement
i. The author of this thesis (including any appendices and/or schedules to this thesis)
owns certain copyright or related rights in it (the “Copyright”) and s/he has given The
University of Manchester certain rights to use such Copyright, including for
administrative purposes.
ii. Copies of this thesis, either in full or in extracts and whether in hard or electronic copy,
may be made only in accordance with the Copyright, Designs and Patents Act 1988 (as
amended) and regulations issued under it or, where appropriate, in accordance with
licensing agreements which the University has from time to time. This page must form
part of any such copies made.
iii. The ownership of certain Copyright, patents, designs, trade marks and other
intellectual property (the “Intellectual Property”) and any reproductions of copyright
works in the thesis, for example graphs and tables (“Reproductions”), which may be
described in this thesis, may not be owned by the author and may be owned by third
parties. Such Intellectual Property and Reproductions cannot and must not be made
available for use without the prior written permission of the owner(s) of the relevant
Intellectual Property and/or Reproductions.
iv. Further information on the conditions under which disclosure, publication and
commercialisation of this thesis, the Copyright and any Intellectual Property and/or
Reproductions described in it may take place is available in the University IP Policy (see
http://www.campus.manchester.ac.uk/medialibrary/policies/intellectual-
property.pdf), in any relevant Thesis restriction declarations deposited in the
University Library, The University Library’s regulations (see
http://www.manchester.ac.uk/library/aboutus/regulations) and in The University’s
policy on presentation of Theses.

14
Acknowledgements
I would like to acknowledge Arthritis Research UK for providing me with the funding and
opportunity to conduct this research and my supervisors, Dr Steve Eyre and Professor Andy
Brass, for their help and input throughout. I would also like to thank Dr Paul Fisher for his
help with the Taverna workbench, Professor Anne Barton and Dr Jo Pennock for their
advice and finally my wife Nicola for her help and support.

15
1 Introduction
The study of common disease genetics has its origins in linkage and candidate gene
approaches. Whilst linkage studies have been successful in the mapping of genes
underlying monogenic disorders (Buchwald et al. 1986; Gitschier et al. 1985), they have
had limited success translating to common, complex genetic diseases (Altmüller et al.
2001). With some notable exceptions in diseases such as inflammatory bowel disease (IBD)
(Hugot et al. 2001; Ogura et al. 2001; Rioux et al. 2001; Stoll et al. 2004), schizophrenia
(Stefansson et al. 2002), rheumatoid arthritis (RA) (Remmers et al. 2007) and type 1
diabetes (T1D) (Cox et al. 2001; Nistico et al. 1996) for genes with large effect sizes, linkage
studies have been grossly underpowered to detect the genetic effects now expected in
complex diseases and have been, consequently, unproductive. Likewise candidate gene
approaches have met with limited success (Siontis, Patsopoulos, & Ioannidis 2010) as
selection of genes within likely functionally important pathways for disease, tested in low
powered studies, with a low prior probability of being causal, provided an unrewarding
study design.

Since 2007 the concurrent advancements in genotyping technology, study design and
genetic databases has seen great advances made in complex disease genetics, specifically
the introduction of Genome-Wide Association Studies (GWAS). GWAS have allowed
researchers to test thousands of markers, interrogating the entire genome for potential
disease association, in a hypothesis free way that has been impracticable in the past. GWAS
have been tremendously successful and have implicated several novel disease associations
in many common complex diseases including Crohn’s disease (Duerr et al. 2006; Libioulle et
al. 2007), T1D (Wellcome Trust Case Control Consortium 2007), RA (Wellcome Trust Case
Control Consortium 2007) and multiple sclerosis (MS) (De Jager et al. 2009). GWAS are now
established as a robust, practical and cost effective method for rapidly investigating
numerous markers, made possible by the completion of the Human Genome Project
(Lander et al. 2001) and the International HapMap Project (The International HapMap
Consortium 2003) along with advances in genotyping technology.

The genetic markers of choice for GWAS are single nucleotide polymorphisms (SNPs) which
are single base differences in the DNA. SNPs usually have two alleles, for example, one
allele can be an adenosine and the other, cytosine. Humans can either be homozygous: two
identical alleles, or heterozygous: one of each allele.

16
The frequency of each allele can vary between groups of individuals and it is thought that
common variations can increase an individual’s susceptibility to a disease. Association
studies compare the frequencies of each allele in case (individuals with the disease) and
control (‘normal’ individuals) samples to assess whether one allele is seen at different
frequencies in case cohorts compared to controls. If an allele occurs in cases more or less
frequently than controls at a statistically significant level, the allele is associated with the
disease.

1.1 The International HapMap Project


GWAS have been greatly facilitated by utilising information produced by phase I of the
International HapMap Project (The International HapMap Consortium 2003) completed in
2005 (The International HapMap Consortium 2005). The projects aim is to “determine the
common patterns of DNA sequence variation in the human genome, by characterising
sequence variants, their frequencies, and correlations between them, in DNA samples from
populations with ancestry from parts of Africa, Asia and Europe” (The International
HapMap Consortium 2003) and research is ongoing to achieve this goal.

The International HapMap Project has been conducted in three phases (I, II & III):
approximately 1.3 million SNPs were genotyped for Phase I (Frazer et al. 2007) with an aim
to genotype at least one common SNP every 5 kilobases (KB) in samples from four
populations (The International HapMap Consortium 2005) (Table 1). Phase II genotyped an
additional 2.1 million SNPs in the same samples resulting in a SNP coverage of
approximately one per KB and included an estimated 25-35% of common SNPs (minor
allele frequency (MAF) ≥ 0.05) (Frazer et al. 2007); Phase III included additional samples
from the original four populations and samples from seven extra populations (Table 1)
genotyped on the Illumina 1M and Affymetrix 6.0 SNP chips. The latest release of HapMap
(rel27) consists on average 2,390,273 SNPs for each population (4,030,774 SNPs from
samples of European ancestry) generated by merging Phase I, II & III genotypes and allows
researchers to identify SNPs and calculate the linkage disequilibrium (LD) between loci.

17
Table 1 HapMap sample details comprising of 1,301 samples from 11 populations. The population
description, abbreviation and sample numbers are shown (Phase III totals include samples from
Phase I & II).
Population Description Abbreviation Phase I & II Phase III
Utah residents with Northern and Western CEU 90 180
European ancestry from the CEPH collection
Han Chinese in Beijing, China CHB 45 90
Japanese in Tokyo, Japan JPT 45 91
Yoruba in Ibadan, Nigeria YRI 90 180
African ancestry in Southwest USA ASW 90
Chinese in Metropolitan Denver, Colorado CHD 100
Gujarati Indians in Houston, Texas GIH 100
Luhya in Webuye, Kenya LWK 100
Mexican ancestry in Los Angeles, California MEX 90
Maasai in Kinyawa, Kenya MKK 180
Toscans in Italy TSI 100

1.2 Linkage Disequilibrium (LD)


Linkage Disequilibrium (LD) is the non-random association of alleles at linked loci (Jorde
2000). When two loci are inherited together more often than by chance they are said to be
in LD and the genetic association with disease at an untested SNP in LD with a genotyped
SNP can be predicted given the extent of LD between the two sites. LD can be measured by
two- or multi- locus methods with the former being the most common as it is
computationally less intensive to calculate.

The first commonly used two-locus method was D, developed by Richard Lewontin in 1964.
Consider a pair of diallelic loci, A and B, with frequencies PA, Pa, PB and Pb; D measures the
difference between the observed frequencies of the co-occurrence of PA and PB, and the
expected frequency of co-occurrence under linkage equilibrium - PAPB, i.e. D = PAB - PAPB. A
method of scaling D was also proposed by Lewontin and is defined as D’ = D/Dmax (where
Dmax is the smaller of PAPb and PaPB), a D’ value of 1 shows complete LD whereas a value of <
1 indicates a disruption to the ancestral LD and has no clear interpretation (Ardlie,
Kruglyak, & Seielstad 2002).

The second common measure of LD is r2 (sometimes referred to as Δ2) and is the square of
the correlation coefficient between two loci (Hill & Robertson 1968). It is derived from D
and defined as r2 = D2/PAPaPBPb, where PA, Pa, PB & Pb are defined as previously. Two loci said
to be in perfect LD are represented by an r2 value of 1 and r2 values < 1 are easily
interpreted (Ardlie, Kruglyak, & Seielstad 2002) and as such r2 is the measurement of choice
in the context of mapping (Ardlie, Kruglyak, & Seielstad 2002). When SNPs are in perfect
LD, knowing the genotypes of one SNP directly infers the genotypes of its correlates.

18
1.3 Genome-Wide Association Studies (GWAS)
Genome-Wide Association Studies (GWAS), as the name implies, aim to test for association
across the whole genome. Therefore with no information of how SNP alleles segregate, all
SNPs would need to be genotyped to provide the required coverage. Given the
approximately 10 million common SNPs found in Caucasians alone, this would not only be
expensive but also technologically limiting. Pragmatically, GWAS are performed using a
subset of SNPs and therefore to achieve maximum coverage, SNP selection is important to
capture as much of the genomic variation as possible otherwise potential associations may
be missed. For a genotyped SNP to be associated it must either be the causal SNP or in high
LD with the causal variant (Hirschhorn & Daly 2005).

For association analysis, the LD measure r2 is considered informative as it is ‘inversely


proportional to the sample size that is required for detecting disease association given a
fixed genetic risk’ (Wang et al. 2005). As the measure of r2 between genotyped and causal
SNPs drops, so does the power of the study to detect a significant association (Wang et al.
2005). Modern GWAS utilise the correlation of LD in the International HapMap Project to
maximise the genomic coverage whilst simultaneously minimising the number of SNP
markers needed to interrogate using a ‘tagging SNP’ approach. A tag SNP is a SNP which is
correlated to one or more neighbouring SNPs such that it is representative of the
neighbouring SNPs (Teo, Small, & Kwiatkowski 2010). The amount of correlation required
between the tag SNP and the neighbouring SNPs may be varied to ensure minimal SNP
numbers whilst ensuring maximal predictive power (or coverage) of other SNPs in the
region (Figure 1). A consensus r2 value of ≥ 0.8 is sufficient to provide good coverage with
minimal numbers of SNPs with a relatively small loss in power (Barrett & Cardon 2006;
Carlson et al. 2003; Jorgenson & Witte 2006; Wang et al. 2005).

2
Figure 1 Genomic coverage resulting from selecting maximal efficiency SNPs at various r cut-offs. Figure
taken from Barrett & Cardon (2006) using data from phase II of the International HapMap Project (Frazer et
al. 2007).

19
Once the markers have been selected, they must all be assayed robustly and efficiently.
This is achieved by utilising DNA microarray technology which uses probes to assay the
selected SNPs. An overview of the procedure for the two main platforms is shown in Figure
2 & Figure 3. Briefly, the DNA is fragmented, amplified in an unbiased nature and allowed
to hybridise to the probes. Following fluorescent labelling, the genotype of each assayed
SNP is then determined by measuring the relative fluorescence of each allele.

Figure 2 Affymetrix® Genome-Wide Human SNP Assay 5.0/6.0 (Affymetrix 2009). DNA is digested using
specific restriction enzymes, followed by specific adaptor ligation. A PCR is performed designed to
preferentially amplify fragments between 100-1,100bp and products from each restriction digest are
subsequently pooled and cleaned up. Samples are further fragmented using DNase I, fluorescently labelled
and allowed to hybridise to the probes. Fluorescence of each probe is measured and subsequent genotype
calling is performed using an appropriate calling algorithm.

20
Figure 3 Illumina® Infinium™ II Assay (modified from Illumina 2008a). Genomic DNA is isothermally amplified
in a non biased fashion and enzymatically fragmented in a controlled reaction. The product is cleaned up and
allowed to hybridise to the BeadChip array. Following hybridisation an enzymatic single base extension is
carried out using fluorescently labelled nucleotides. The fluorescence of each bead is measured by the
BeadArray Reader and converted into genotypes using a custom algorithm.

1.4 GWAS: Past, Present and Future


The modern complex genetic era, and GWAS, began in 2007 and the Wellcome Trust Case
Control Consortium (WTCCC) study (Wellcome Trust Case Control Consortium 2007). This
was the pioneering study that was the first to take advantage of the favourable genetic and
technological advances. The necessary technological advancement for GWAS came from
the development of array based assays. Two main genome-wide array based genotyping
methods currently exist: The Affymetrix® GeneChip® - based on a conventional two-
dimensional microarray, and the Illumina® BeadArray™ - based on a bead pool assembled
onto a patterned substrate (Oliphant et al. 2002).

1.4.1 Affymetrix® GeneChip®


The Affymetrix® GeneChip® is an ordered two-dimensional array which can assay
thousands of SNPs on a single chip. Each SNP is represented by a small area of the chip
(≈18μm2) called a feature. Each feature contains millions of copies of the same DNA probe
(20-25 bases), synthesised directly onto a glass substrate by photolithography (Figure 4).
The first GeneChip® SNP arrays were the Affymetrix® GeneChip® Human Mapping 10K
Array (Affymetrix 2003), released in 2003, followed by the Human Mapping 100K Set
(Affymetrix 2004) in 2004; each assaying over 10,000 and 100,000 SNPs respectively.

21
Figure 4 Photolithography based array construction (Lipshutz et al. 1999). A chip is
constructed of a solid substrate with covalently attached linker molecules protected by a
photolabile group. Light is directed through a mask (b) which removes the protective
group from linkers exposed to the light. Nucleotides protected with the same photolabile
groups are chemically coupled to the unprotected linkers. The process is repeated, using
the appropriate mask, allowing different DNA probes to be constructed at each site.

1.4.2 Illumina® BeadArray™


The principle component of the Illumina® BeadArray™ technology is based on small, 3μm
silica beads with thousands of identical, covalently attached oligonucleotide probes,
designed specifically for genotyping a unique SNP. Each bead type is then pooled to create
a ‘bead pool’, consisting of many thousands of different SNP detection beads, which can
then be assembled into an array. The BeadArray™ technology is available in two formats,
the Sentrix® Array Matrix (SAM) and the Sentrix® BeadChip (Figure 5), although the primary
format used in GWAS is the Sentrix® BeadChip. The Sentrix® BeadChip is a 2.5 x 8.25cm
silicon wafer in which wells, formed by photolithography and plasma etching, can accept
one bead from the bead pool. This produces a self-assembling, randomly ordered array and
as such a decoding process, consisting of a four colour ‘barcode’ (Figure 6), must be used
during the manufacturing process to map the specific location of each bead type (Fan et al.
2006).

22
Figure 5 Illumina® Array Formats (Illumina 2010a). The BeadArray™ technology is
available in two formats: an optical fibre based – the Sentrix® Array Matrix (SAM) and
a silicon wafer based – the Sentrix® BeadChip. Each format can accept the same beads
which are held in place by Van der Waals forces.
Figure 6 Illumina® BeadArray
Decoding Process (Illumina 2009a).
A four colour system is used to
detect the barcode of each bead on
the BeadChip after manufacture to
identify which SNP the probe
detects.

This allows millions of beads to assemble on the chip which provides high SNP number
coupled with sufficient coverage (≈20-fold) to genotype hundreds of thousands of SNPs.
This technology was first used in the Sentrix® Human-1 BeadChip, released in 2005

23
(Illumina 2005), which could genotype over 100,000 SNPs using the Infinium™ assay (Figure
3) (Gunderson et al. 2005).

Although researchers were now able to accurately and efficiently genotype over 100,000
SNPs, it was not until the International HapMap Project was finished that the true power of
array-based genotyping methods could be utilised for large GWAS. The first genome-wide
array was the Affymetrix® Human Mapping 500K Array Set released in 2005 (Affymetrix
2005). This was followed by the Illumina® HumanHap300 at the beginning of 2006 (Illumina
2006b) and the Illumina® HumanHap550 shortly after (Illumina 2006a). It was now possible
to study common human variation at a genome-wide level and has led to thousands of
publications in multiple disease types (Figure 7).

Initiating the whole GWAS field in 2007, the Wellcome Trust Case Control Consortium
(WTCCC) published their findings of 14,000 cases and 3,000 shared controls genotyped on
the Affymetrix® Human Mapping 500K Array Set (Wellcome Trust Case Control Consortium
2007). The cases were comprised of 2,000 samples from seven common diseases: bipolar
disorder (BD), coronary artery disease (CAD), Crohn’s disease (CD), hypertension (HT),
rheumatoid arthritis (RA), type 1 diabetes (T1D) and type 2 diabetes (T2D); with controls
from 1,500 1958 birth cohort samples (1958BC) and 1,500 UK blood donors (NBS). The
study identified 24 independent association signals at P < 5 x 10-7, of which 5 had been
reported previously (Wellcome Trust Case Control Consortium 2007).

Since the release of the first genome-wide arrays, technological advances coupled with the
ongoing work of the International HapMap Project have led to larger arrays with increased
SNP and sample numbers. The Affymetrix® Genome-Wide Human SNP Array 5.0, released
in early 2007 (Affymetrix 2007a) could assay the same content as the Human Mapping
500K Array Set plus 500,000 additional copy number variation (CNV) probes on a single
array. The was closely followed by the Affymetrix® Genome-Wide Human SNP Array 6.0
which could assay over 1.8 million markers for genetic variation (Affymetrix 2007b).

24
Figure 7 Published Genome-Wide Associations (Hindorff et al. 2010). Results of 779 published GWAS up to
-8
March 2010 at p ≤ 5x10 for 148 traits.

Using an improved version of the Infinium™ assay, the Infinium™ II (Steemers et al. 2006),
Illumina® released the Human1M BeadChip offering over 1 million SNP probes in addition
to known and novel CNV sites (Illumina 2007). Further improvements to the BeadChip
technology saw the introduction of Infinium™ HD in the form of the Human1M-Duo and
Human610-Quad BeadChips in 2008 (Illumina 2008b), allowing two and four samples to be

25
genotyped simultaneously. With the new Infinium™ HD technology researchers were able
to add custom content to the Human1M-Duo+ and HumanHap550-Quad+ BeadChips.

The 1000 genomes project (1000 Genomes 2010), launched in 2008 aims to “produce a
catalogue of variants that are present at 1 percent or greater frequency in the human
population across most of the genome, and down to 0.5 percent or lower within genes“
(1000 Genomes 2008) using a next generation re-sequencing method. The Illumina®
HumanOmni1-Quad BeadChip, released 2009 (Illumina 2009b), contains over 1 million
SNPs taken from all three HapMap phases, the initial 1000 genomes release and recent
publications with the ability to genotype four samples on the same chip. Additionally
Illumina® have released a ‘Product Roadmap’ (Figure 8), which aims to add further content
from the 1000 genomes project to the HumanOmni1-Quad to produce the HumanOmni5
BeadChip containing 5 million SNP probes.

Figure 8 Illumina® Whole-Genome Genotyping Product Roadmap (Illumina 2010b). Outlines the sources of
current and future Illumina® genotyping content including incorporating content from the 1000 genomes
project (1000 Genomes 2010).

1.5 GWAS Limitations


GWAS have improved our ability to detect genetic variations associated with disease;
however almost without exception the associations found have been extremely low (OR
1.1-1.5) effect sizes. Whilst being unequivocally associated with disease, they do not
account for a great proportion of the genetic predisposition to disease. Indeed, for some
diseases no significant associations have been found or they subsequently fail validation.
The failure of some GWAS to detect genetic associations could be due to their small effect

26
sizes and as such, efficient study design must be adopted to ensure an efficient, well
powered GWAS. For example, sample size, sample selection and population substructure
all need to be considered during study design. In addition, GWAS studies are efficient in
identifying common variations associated with disease but are relatively inefficient and
underpowered in identifying rare associated variations.

1.5.1 Common vs. Rare Variants Hypotheses


The common disease, common variant (CDCV) hypothesis proposes that the genetic risk for
common diseases is attributable to relatively few, high frequency (>1%) SNPs (Becker
2004). Although there is currently insufficient evidence to prove or disprove this
hypothesis, there have been examples in Alzheimer’s disease, deep venous thrombosis and
type II diabetes where common variants have been found to be causal (Reich & Lander
2001).

The alternative to the CDCV hypothesis is the common disease, rare variant (CDRV)
hypothesis which predicts that multiple, recent rare variations are more likely to contribute
to common diseases than the older, common variations (Pritchard 2001). Supporters of the
CDRV hypothesis argue that to reach such high frequencies, common variations are likely to
be older and therefore would be subject to negative selection; whereas low frequency
variations are rare because they have been selected against or are only a few generations
old and have not been subjected to selection pressures (Schork et al. 2009).

Due to their low frequency and high heterogeneity rare variations are difficult to detect
using GWAS however Dickson et al. recently proposed the idea of ‘synthetic associations’
(Dickson et al. 2010). Synthetic associations are seen when a common variation identified
as disease-associated in a GWAS may actually be caused by multiple rare variants which are
tagged by the common SNP due to a common genealogy. Although simulated data showed
this hypothesis to be feasible, there is currently only limited evidence of this in ‘real’ data
(Fellay et al. 2010).

1.5.2 False-Positive Associations


For any study producing millions of data points, limiting false-positives is essential. A false-
positive can be classified into one of three categories: statistical fluctuations, arising by
chance and resulting in low p-values; systematic biases due to study design; and technical
artefacts (Hirschhorn & Daly 2005). The first category can be limited by applying multiple
testing methods, such as Bonferroni correction, permutation testing or false discovery rate
(FDR) analysis. The main source of systematic biases is due to population stratification and

27
can lead to spurious associations due to naturally occurring allele frequency differences
between populations and can be corrected by programs such as EIGENSTRAT (Price et al.
2006). False-positives due to technical artefacts are more difficult to correct for but are less
likely to occur in GWAS as both cases and controls are usually genotyped using the same
platform and subjected to stringent quality control criteria.

1.5.3 Replication
Once a GWAS has identified SNPs associated with disease, these must be independently
validated to ensure a true association and eliminate the chance of false positives.
Replication involves genotyping the associated marker in independent, larger sample sizes
ideally using a different genotyping platform to fully ensure any false-positives are
removed. To ensure a successful GWAS, large sample sizes are required and this can often
cause problems when replicating if larger cohorts do not exist. The availability of a second
genotyping technology could be limiting due to expense; laboratories often invest in one
platform and as such technical limitations are not removed.

1.5.4 Identification of Causal Variant


As previously mentioned, the power of GWAS to find disease associations comes from their
tagging SNP design approach. An associated SNP could therefore be tagging multiple SNPs
within a region whereby any of these SNPs could be responsible for the association signal
and as such would require further validation and fine-mapping in an attempt to find the
causal variant. Another consequence of this approach is the associated SNP could be
tagging a rare variant, in unknown linkage with the associated SNP or even a yet
undetected change which would require re-sequencing of the region to identify potential
causal candidates. Although sequencing costs have reduced dramatically this could still be a
costly approach.

1.5.5 Effect Size


Common variation associations typically found by GWAS are characterised by odds ratios
(ORs) of between 1.2 and 1.5 (Bodmer & Bonilla 2008), although there are notable
exceptions such as the associations seen in RA and T1D with the MHC region (OR 5.21 and
18.52 respectively) (Wellcome Trust Case Control Consortium 2007). However, many rare
variations found to be associated with disease have larger effect sizes with ORs > 2 (Figure
9). Therefore common disease associations, while significant at a population level, may
contribute little to the overall genetic component of the disease and as such further
strategies will have to be developed to fully explain the genetic contribution to disease.

28
Figure 9 Distribution of odds ratios for common and rare variants (Bodmer & Bonilla
2008). Included in the analysis were 61 rare variants and 217 common variants. Odds
ratios were obtained from the literature.

1.6 GWAS & post-GWAS Analysis Methods


1.6.1 Traditional Analysis
One of most powerful, widely used analysis method for GWAS is a single point, one degree
of freedom test of association such as the Cochran-Armitage test (McCarthy et al. 2008). In
single point analysis, the distribution of genotypes is compared between cases and controls
for each SNP independently and can be performed with or without covariate correction
(e.g. population stratification). This however does not incorporate information from SNPs
in LD which could potentially add genetic information and power to detect disease
association.

To overcome this limitation, research into multi-marker analysis methods such as


coalescence (Zollner & Pritchard 2005), haplotype-based (Morris 2006) and imputation
(Marchini et al. 2007) methods have provided a modest boost to power (McCarthy et al.
2008). As imputation techniques use data from the International HapMap Project as a
reference, these have been successful in discovering effects for HapMap SNPs not included
on the commercial arrays.

Another approach to overcome power limitations in GWAS is to combine study results in a


meta-analysis. A meta-analysis is a statistical approach for combining evidence across
studies which were designed to test the same research hypothesis (Cantor, Lange, &

29
Sinsheimer 2010) thus increasing the overall power of the study. This approach has been
extensively used in most genetic studies of complex diseases to identify novel loci, such as
Stahl et al. (2010) in RA.

1.6.2 Epistasis Analysis


Most analysis methods used for GWAS assume each individual marker effect is additive i.e.
if an individual has two risk factors, the overall risk for that individual is the sum of the two
individual risk factors. Epistatic interactions occur when the alleles of one locus effects the
alleles of another locus to produce a non linear or multiplicative effect i.e. if an individual
has two risk factors, the overall risk is the product of the two individual risk factors (Cantor,
Lange, & Sinsheimer 2010; Cordell 2002). Epistatic interactions found in humans include
the RET-EDNRB interaction in Hirschsprung disease (Carrasquillo et al. 2002), the IL4-IL13
promoter variants in asthma (Howard et al. 2002) and the ADRA2A-ADRB2 adrenergic
receptor subunit interaction in congestive heart failure (Small et al. 2002).

The importance of epistatic interactions has been accepted (Cantor, Lange, & Sinsheimer
2010) but they are difficult to detect despite the increasing computational resources and
improved statistical methods available. To test for even a simple epistatic interaction
between any two markers from a 100K SNP array results in billions of comparisons and
therefore statistical techniques such as multiple linear or logistic regression are used
(Cantor, Lange, & Sinsheimer 2010; Cordell 2002) that aim to simplify the model whilst
reducing computational requirements.

1.6.3 Molecular Pathway & Network Based Analysis


Although traditional analysis of GWAS data has been successful in identifying new variants
associated with disease (Barrett et al. 2009; Gregersen et al. 2009; Stahl et al. 2010;
Wellcome Trust Case Control Consortium 2007), over the last few years, researchers have
tried to develop new methods to analyse GWAS data to maximise its potential. GWAS
Pathway Analysis (GWASPA) attempts to test whether a molecular pathway is associated
with the disease by assigning SNPs to genes and then testing these genes for enrichment in
the selected pathway (Cantor, Lange, & Sinsheimer 2010). This approach combines
biological evidence with the statistical association and is therefore potentially a powerful
analysis strategy.

30
To achieve a successful pathway analysis several experimental design issues need to be
addressed:-
1. Assign SNPs to genes
2. Give the genes an association score
3. Map genes onto a pathway

1.6.3.1 Assign SNPs to Genes


Markers located within a gene, including intronic markers, are easily assigned to the
gene(s) in which they are located and additionally SNPs positioned in known regulatory
regions, such as promoters or enhancers, can also be assigned to the gene(s) which they
regulate. However, SNPs which lie in an intergenic region and are not located in a known
regulatory region are more difficult to define and must be assigned to genes using
predefined rules (Cantor, Lange, & Sinsheimer 2010) or alternatively discarded. Three main
methods have been used to assign SNPs to genes: multiple genes by distance, single gene
by hierarchy and LD-based methods.

Multi-gene by distance methods assign SNPs to genes if they lie within a defined distance
from a gene. Where SNPs map to multiple genes, all genes are taken forward for
subsequent analysis. This method was used by Wang, Li & Bucan (2007), Eleftherohorinou
et al. (2009) and Luo et al. (2010). For single-gene by distance methods, where the same
SNP maps to multiple genes, a hierarchical approach is used to select a single gene based
on the SNP’s location relative to both genes. This method was used by Torkamani, Topol, &
Schork (2008). LD-based methods utilise information from the International HapMap
Project to find the extent of an association. Table 2 shows how many SNPs were
successfully mapped to genes when starting with the same dataset from the WTCCC GWAS
study (Wellcome Trust Case Control Consortium 2007) from the selected publications.

Table 2 Comparison of mapping methods for selected studies using the same WTCCC GWAS (Wellcome
Trust Case Control Consortium 2007) RA dataset.
Study Torkamani et al. Eleftherohorinou et al. Peng et al. Luo et al.
SNP to gene mapping Single gene by Multiple genes by ? Multiple genes by distance
method hierarchy (5KB) distance (10KB) (2KB 5’ & 500bp 3’)
Number of mapped ? 37,495 (within 459,653 459,653
SNPs pathways)
Number of mapped 15,835 1,368 (within 15,848 15,732
genes pathways)
Number of pathways ? 84 465 465

1.6.3.2 Give the Genes an Association Score


To test for association, genes must be assigned a statistical probability which can be
achieved in a number of ways. The simplest approach is to assign each gene either the

31
minimum or maximum p-value of all SNPs which map to that gene without correction as
used by Baranzini et al. (2009), Torkamani, Topol, & Schork (2008) and Wang, Li & Bucan
(2007). Peng et al. (2010) combined SNP p-values using various statistical methods, such as
the FDR method and Eleftherohorinou et al. (2009) summed the Armitage trend test
statistic over all SNPs to develop a cumulative trend test statistic, CTpathway. However these
methods assumed marker independence which could be violated due to LD. Recently, Luo
et al. (2010) developed three methods for combining dependant p-values: a linear
combination test (LCT), a quadratic test (QT) and a decorrelation test (DT), and applied
these methods to previously published GWAS data to identify pathways associated with
RA.

1.6.3.3 Map Genes onto a Pathway


To test for pathway enrichment, genes must be assigned to pathways using a reliable, well
annotated pathway database and then analysed using an appropriate statistical method.
Table 2 summarises how many pathways were tested for the selected publications.
Alternatively, a text mining approach may be used to search for evidence of gene
relationships in available literature as used by Raychaudhuri et al. (2009a).

To test for pathway association, statistical methods, adapted from ones developed for
expression microarray analysis, or proprietary software can be used. Peng et al. (2010) and
Luo et al. (2010) combined gene p-values using similar methods to those used to obtain the
gene p-values. Gene set enrichment analysis (GSEA), a technique which takes defined sets
of genes (e.g. pathways) and tests whether the gene set is more likely to correlate with one
of two outcomes (e.g. phenotypes or disease states) (Mootha et al. 2003; Subramanian et
al. 2005), was modified by Wang, Li & Bucan (2007) to generate an enrichment score to
provide the overall significance of the pathway (Cantor, Lange, & Sinsheimer 2010).
HyperLasso (Hoggart et al. 2008), used by Eleftherohorinou et al. (2009), MetaCore, used
by Torkamani, Topol, & Schork (2008) and Cytoscape, used by Baranzini et al. (2009) are
software packages designed to identify genes and/or pathways primarily responsible for
the pathway effect using custom databases and statistical techniques.

1.6.3.4 Summary
Many of these studies used the same dataset from the WTCCC GWAS study (Wellcome
Trust Case Control Consortium 2007) and each method can therefore be evaluated by
comparing their results. Little difference was seen when comparing the number of genes
successfully mapped, however it was not possible to compare the results to LD-based

32
methods. The most significant genes identified by each study are shown in Table 3 and the
most significant pathways are shown in Table 4. Although all these 3 analysis techniques
utilised the same starting dataset and attempted to achieve the same goals, mapping the
association signals onto biological pathways, once the extended HLA region is excluded,
each identified a different set of associated genes and associated pathways. Therefore the
results of these analyses show the inconsistencies inherent in GWASPA and exemplifies the
requirement of more robust, validated pipelines, techniques and workflows.

33
Table 3 Comparison of significant genes found by selected studies using the WTCCC GWAS (Wellcome Trust
Case Control Consortium 2007) RA dataset. Genes in bold have been identified by two or more studies and
genes in italics are known RA susceptibility loci. Results from Peng et al. are the combined results from both
statistical methods used and genes marked with an asterisk (*) were only significant using Fisher’s
combination test.
Study Baranzini et al. Peng et al. Luo et al.
Genes ABL1
CBLB
CD4
GHR
GRB2
HLA-DPA1 HLA-DPA1 (<1E-20) HLA-DPA1 (2.72E-11)
HLA-DQA2 HLA-DQA2 (<1E-20) HLA-DQA2 (4.84E-11)
HLA-DRA HLA-DRA (<1E-20)
KDR
MAPK1
PIK3R1
PRNP
RET
SELL
APOM (<1E-20)
BAT3 (<1E-20) BAT3 (5.16E-07)
BAT4 (<1E-20) BAT4 (<1E-17)
BBOX1 (<1E-20)
BTNL2 (<1E-20) BTNL2 (1.55E-07)
C6orf10 (<1E-20)
CFB (<1E-20)
GPSM3 (<1E-20) GPSM3 (5.20E-09)
HLA-DPB1 (<1E-20) HLA-DPB1 (2.34E-11)
HLA-DQA1 (<1E-20) HLA-DQA1 (1.49E-11)
HSPA1L (<1E-20)
ITPR3 (<1E-20)*
LOC642038 (<1E-20)
LOC731881 (<1E-20) LOC731881 (<1E-17)
MAGI3 (<1E-20)
MMEL1 (<1E-20)
MSH5 (<1E-20)
NOTCH4 (<1E-20)
PRRT1 (<1E-20)
PSORS1C1 (<1E-20)
RDBP (<1E-20) RDBP (<1E-17)
RSBN1 (<1E-20)
STXBP6 (<1E-20)*
TAP2 (<1E-20)
TNXB (<1E-20)
TRIM26 (<1E-20)
TRIM40 (<1E-20)
VARS2 (<1E-20)
AGPAT1 (3.68E-12)
AIF1 (8.22E-15)
CREBL1 (5.91E-09)
EHMT2 (7.01E-11)
HLA-DQB1 (6.55E-11)
MICA (5.82E-09)
PTPN22 (2.44E-15)
RPS18 (2.80E-06)
ZFP57 (3.78E-07)

34
Table 4 Comparison of significant pathways and associated p-values found by selected studies using the WTCCC GWAS (Wellcome Trust Case Control Consortium 2007) RA
dataset. Pathways in bold have been identified by two or more studies.
Pathway Torkamani et al. Baranzini et al. Eleftherohorinou et al. Peng et al. Luo et al.
3-Chloroacrylic acid degradation 0.00626734
Alternative complement pathway <1E-17
Actions of Nitric Oxide in the Heart Pathway 0.0101923
Antigen processing and presentation 1.07E-05 5.2E-11 (all); 4.82E-06 <1E-17
1.5E-10 (MHC II)
Attenuation of GPCR Signaling Pathway 0.00405584
Antigen-dependent B-cell activation pathway 8.79E-05
B Lymphocyte Cell Surface Molecules Pathway 0.0104378 4.89E-05
Bystander B Cell Activation Pathway 0.00274113 4.40E-05
CCR3 signaling in Eosinophils Pathway 0.00260024
cdc25 and chk1 Regulatory Pathway in response to DNA damage Pathway 0.0104378
Cell adhesion molecules (CAMs) 6.06E-06 1.13E-06 <1E-17
Cell Communication 0.00843743 8.15E-11
Cells and Molecules involved in local acute inflammatory response Pathway 0.00626734
ChREBP regulation by carbohydrates and cAMP Pathway 0.00900246
Complement and coagulation cascades pathway 5.94E-13
Complement pathway <1E-17
35

Cytokine-cytokine receptor interaction 0.00374261 <1E-17


Cytokines and Inflammatory Response Pathway 0.0131556 1.83E-07
dependent Protein Kinase Inhibits Signaling through the T Cell Receptor Pathway 0.00260024
dependent protein kinase, PKA Pathway 0.00666558
ECM-receptor interaction 9.44E-05
EGF Signaling Pathway 0.0131556
epsilon Pathway 0.00225447
Ether lipid metabolism pathway 3.26E-09
Focal adhesion 0.00103288 4.06E-07
Fructose and mannose metabolism 0.00475638
Glioma 0.00240015
Glycan structures - biosynthesis 1 0.00713575
Glycolysis/Gluconeogenesis 0.008516
IFN alpha signaling Pathway 0.00577606
Inhibition of Matrix Metalloproteinases Pathway 0.00274113
Insulin signaling Pathway 0.00544261
Pentose phosphate Pathway 0.00380961
Prion disease 0.00669113
RB Tumor Suppressor/Checkpoint Signaling in response to DNA damage Pathway 0.0103261
Sulfur metabolism 0.000202263
T Cytotoxic Cell Surface Molecules Pathway 0.0104378
Gap junction pathway 2.08E-06
Glycerolipid metabolism pathway 1.10E-06
Glycerophospholipid metabolism pathway 7.32E-07
IL 5 signaling pathway 0.000103
Jak-STAT signaling pathway 4.40E-09 1.19E-10
Lysine degradation pathway 0.000109
MAPK signaling pathway <1E-17
Natural killer cell-mediated cytotoxicity pathway 1.66E-09
Th1/Th2 Differentiation Pathway 0.000435327 4.62E-07
The Role of Eosinophils in the Chemokine Network of Allergy Pathway 0.00666558 1.02E-05
Tight junction pathway 7.09E-10
Toll-like receptor signaling pathway 1.29E-08
Type I diabetes mellitus 3.16E-05 6.80E-10 5.26E-11 <1E-17
tyrosine phosphatase alpha Pathway 0.00669113
Natural killer cell mediated cytotoxicity 5.20E-05
Pattern recognition receptors - TLR9-IRF5 1.60E-06
Purine metabolism 3.20E-05
Signaling molecules and interaction 2.00E-02 0.0E+00 (APC: T cell);
8.3E-10 (Tc cell: target cell);
0.0E+00 (Th cell: B cell)
T-cell activation - All 1.40E-04
36

T-cell activation - Cytokines/receptors 2.50E-06


T-cell activation - Cytokines/receptors/Jak-STAT/suppressors 1.10E-04
Chronic myeloid leukemia 1.20E-02
Environmental information processing 2.44E-02
Human diseases 6.00E-05
Immune system 3.14E-03
Immune response _Function MEF2 in T lymphocytes 0.03
Immune response _Histamine signaling in dendritic cells 0.03
Immune response _MIF - the neuroendocrine-macrophage connector 0.001
Immune response _NF-AT signaling and leukocyte interactions 0.02
Immune response _PGE2 common pathways 0.03
Metabolic disorders 1.54E-04
T-cell receptor signaling pathway 2.00E-02
VEGF signaling pathway 0.02 4.26E-02
Cytoskeleton remodeling_Role PKA in cytoskeleton reorganization 0.05
Development_Angiotensin activation of Akt 0.004
Development_Angiotensin activation of ERK 0.03
Development_Cross-talk VEGF and angiopoietin 1 signaling 0.02
Development_EDNRB signaling 0.02
Regulation of lipid metabolism_PDGF activation of prostacyclin synthesis 0.01
Signal transduction_Calcium signaling 0.003
Signal transduction_cAMP signaling 0.01
Signal transduction_IP3 signaling 0.009
Transport_Membrane trafficking and signal transduction of G-alpha (i) heterotrimeric G-protein 0.04
Cell adhesion_Histamine H1 receptor signaling in the interruption of cell barrier integrxity 0.03
37
1.6.4 Text Mining
Text mining is a process which involves interacting with an unstructured textual document
collection to extract useful information through the identification of interesting patterns
(Feldman & Sanger 2007). Text mining has the potential to uncover previously unknown or
distant relationships between query terms as it does not rely on prior knowledge.

Raychaudhuri et al. (2009a) developed a statistical method to find and evaluate disease
associated gene relations called GRAIL (Gene Relationships Among Implicated Loci). A set of
‘query’ SNPs are evaluated for gene relationships against a set of ‘seed’ SNPs, known to be
associated with the disease. GRAIL employs a text mining approach to search 250,000
PubMed abstracts for the greatest number of gene relationships (Raychaudhuri et al.
2009b). GRAIL assigns both the ‘seed’ and ‘query’ SNPs to genes using information on LD
and recombination hotspots from HapMap data (Figure 10). Firstly, the furthest SNP in
both directions in r2 > 0.5 with the query SNP is identified. Then, proceeding outwards, the
nearest recombination hotspot is found to define the disease region and all genes which
overlap this region are considered. This therefore attempts to incorporate the local genetic
architecture into a more biologically plausible algorithm.

2 1 1 2

3 3

Key
SNP - r2 ≤ 0.5 with query Recombination hotspot
SNP - r2 > 0.5 with query Disease region
Query SNP
SNP mapping process
Query SNP in disease region

Figure 10 SNP to gene mapping used by Raychaudhuri et al. (2010). The following process is carried
2
out for both directions: 1. the furthest SNP in LD (r > 0.5) is identified. 2. region is extended
outwards to nearest recombination hotspot. 3. disease region is defined as the interval between the
two hotspots.

Once potentially implicated genes have been identified from the seed input, genes which
have been identified from the ‘query’ SNPs are ranked against all human genes for
relatedness. The ‘seed’ genes are then scored against every ‘query’ gene for relatedness
using PubMed and assigned a ptext statistic to show how significantly related the ‘seed’ and
‘query’ SNPs are. For example, this method was applied to a meta-analysis in RA
(Raychaudhuri et al. 2008) and identified 22 related loci of which seven were successfully

38
replicated (p < 0.0023) and six were nominally associated at a lower level (p < 0.05)
(Raychaudhuri et al. 2009b).

1.6.5 Conclusions
Biological pathway analysis has tremendous potential to leverage much more from existing
RA GWAS data, including:

1. Identifying genetically modified pathways that may predict different disease


outcomes.
2. Determining whether there is a requirement to impair all identified pathways to
develop RA, or just a selection.
3. Determine whether there are any focal molecules that are key to disease
development.
4. Identify any RA disease specific pathways or pathways that play a role in general
autoimmune disease.
5. Determining whether pathway analysis may aid treatment response prediction.

However, GWASPA is still relatively new and as such all the methods discussed have
advantages and disadvantages. All methods have identified novel genes and pathways
associated with disease outcome, however none have been subsequently validated with
the exception of the study conducted by Raychaudhuri et al. (2009b). The methods used to
assign SNPs to genes vary between analysis tools and are all sub-standard as gene
regulation is still not fully understood. However the approach used by Raychaudhuri et al.
(2009a) in GRAIL attempts to define genetic regions based on an extended region where
regulatory SNPs are most likely to occur and as such represents the most useful definition.

A gene and its regulatory elements represents a large region consisting of multiple SNPs
that may contain multiple independent associations, potentially with opposing genetic
effects (Orozco et al. 2009), and therefore the methods used to combine the significance
of SNPs within a gene could be problematic. Many studies have taken either the most or
least associated SNP within a gene, effectively losing any other information supplied by
other variations in the gene region. Furthermore methods which combine SNPs using
statistical techniques which assume independence are subject to the same pitfalls. The
ideal approach would seem to be to combine all SNPs within a gene, whilst simultaneously
taking factors such as LD and possible opposing genetic factors into account, such as the
methods explored by Luo et al. (2010).

39
Pathway analyses are often reliant on testing for enrichment using a pathway database and
as such the effectiveness of the analysis performed is determined by the annotation and
completeness of the database. However, no pathway database is ideal or complete as new
associations are continually being discovered. The KEGG database (Kanehisa et al. 2006;
Kanehisa et al. 2010; Kanehisa & Goto 2000) is the most commonly used (Cantor, Lange, &
Sinsheimer 2010) but alternatively the Gene Ontology (GO) (Ashburner et al. 2000),
Biocarta, Database for Annotation, Visualisation and Integrated Discovery (DAVID) (Dennis,
Jr. et al. 2003; Huang, Sherman, & Lempicki 2009) and Protein ANalysis THrough
Evolutionary Relationships (PANTHER) (Thomas et al. 2003) databases can be used and
offer sets of structured vocabularies which can be used to describe genes in any organism
and infer biological relationships (Cantor, Lange, & Sinsheimer 2010). Elbers et al. (2009)
demonstrated the effect of using different databases by conducting a pathway analysis on
the same data, using different pathway databases and showed some discordance between
them. An alternative approach to using a pathway database to define relationships
between genes is the text mining approach used by Raychaudhuri et al. (2009a) which does
not rely on fully annotated pathways; however, as in all pathway analysis, it is still reliant
on prior published evidence.

It is clear that although pathway analysis methods have been successful in identifying loci
implicated in disease, these have not been validated and there are still limitations which
need to be addressed and no one method has fully solved all of these. A clear workflow is
required which fully utilises all available knowledge to identify the extent of each SNP
association and pull together data from the available biological resources to identify the
most likely pathways and genes involved.

1.7 Summary/Conclusion
Research into post-GWAS pathway analysis has already commenced but is still in its infancy
with several limitations which need to be addressed to be able to ensure it becomes a
robust method. Using the methods discussed I aimed to produce a post-GWAS workflow
which can be applied to known associations to implicate novel pathways and genes which
contribute to the overall genetic effect of the disease. This can then be applied to sub-
phenotype the clinical entity of the disease by grouping according to which pathways are
disturbed. This has the potential to translate into targeted treatment in subgroups more
likely to respond to therapies directed against these pathways. This will identify key disease
specific genes/pathways furthering knowledge of the disease specific aetiology.

40
The pipeline will not only easily translate a SNP list into a biological pathway utilizing the
currently available software, it will also allow the incorporation of increased genetic data
resources, for example text mining and expression studies, to perform a more
comprehensive and sophisticated analysis. There is also the possibility of inclusion of
matrix/weighting analysis to fully exploit the available data.

41
2 Specific Aims
GWAS have identified over 30 robust SNP associations in rheumatoid arthritis (RA) which
map to multiple regions, implicating genes which may contribute to the pathogenesis of
the disease. The associations individually have a small effect size but in combination, or as
a pathway, may well contribute much more towards the genetic component of a disease.
As illustrated previously, multiple approaches exist for the analysis of biological pathways
using SNP association data. These approaches produce conflicting results, with no clear
picture emerging of a post-GWAS gold standard pathway analysis pipeline emerging.

The specific aims were therefore as follows:-

1. Review the genomic resources and pathway databases available and determine
their suitability. It will also be necessary to explore the best way to query these
resources and manage the flow of data from various formats.
2. Develop a robust Taverna workflow to define a region represented by an
associated SNP, which will allow researchers to query multiple SNPs
simultaneously. This workflow will utilise existing knowledge of the region such as
linkage disequilibrium and recombination hotspots to define a region and identify
all the genes within it. This will allow researchers to identify new genes or confirm
existing ones which could contribute to the disease, additionally these genes can
be used in available pathway databases to search for enrichment and identify
potential disease pathways.
3. Using RA as a model, the new workflow will be used to identify all genes using the
well established loci, which will then be analysed for any functional impact on the
disease using the range of bioinformatics resources highlighted in aim 1.
4. The results from the bioinformatics analysis will be assessed and the pathways
identified will be compared with those found in previous publications.

42
3 Methods
3.1 Review of Available Resources
3.1.1 The Taverna Workbench
The Taverna workbench (Hull et al. 2006) is a tool for building and executing workflows to
retrieve and process information from many biological resources such as the National
Center for Biotechnology Information (NCBI), Entrez (Baxevanis 2008; Benson et al. 2010),
BioMart (Smedley et al. 2009) and KEGG (Kanehisa et al. 2006; Kanehisa et al. 2010;
Kanehisa & Goto 2000). It allows users to integrate multiple resources and produce robust
workflows to carry out a wide range of tasks. These workflows can also be shared amongst
the scientific community.

A Taverna workflow is built on multiple ‘services’, with each service designed to carry out a
specific task and produce output to feed into the next service. Since a particular workflow
could use services from multiple resources the output of each service cannot easily feed
into the next. For example, if service ‘A’ returns a string of nucleotide bases and service ‘B’,
which it feeds into, expects a FASTA formatted input requiring a header line, the workflow
would fail. To solve this, Taverna has services which allow various ways to process outputs
such as text and extensible mark-up language (XML) and also the ability to add ‘beanshells’:
custom scripts based on the JAVA programming language to carry out data manipulation
which cannot be performed by the built in services. This provides flexibility when designing
workflows and is currently the best tool to design and implement complex workflows as it
is not dependent on the users’ operating system.

3.1.2 Genome Resources


There are currently three main genome resources which can be used to obtain information
relating to SNPs, genes proteins etc. These are the Ensembl genome browser (Hubbard et
al. 2009), the University of California Santa Cruz (UCSC) Genome Browser (Kent et al. 2002)
and NCBI Entrez (Sayers et al. 2010). All provide at least one form of computational access
to their resources as well as a web based graphical user interface (GUI).

3.1.2.1 Ensembl
The Ensembl genome browser (Hubbard et al. 2009) is primarily a sequence and gene
annotation database containing both automated and manually annotated genes but also
contains information on SNPs and regulatory elements. There are two programmatic ways
of accessing the data, both providing an application programming interface (API) to an
underlying MySQL database. The first is a Perl based, object orientated API which allows
scripts to query the database and provide high-level access to the underlying data tables.

43
The second is BioMart, a data management system which provides portals to a wide range
of web services and can be used as a user or programme driven bulk export tool (Smedley
et al. 2009). The Ensembl BioMart web services are fully integrated into the Taverna
workbench to allow easy access to the resource. However, there is currently no way, via
Taverna, to specify or easily obtain the genome build which BioMart connects to. This may
cause problems when accessing multiple resources which are based on different genome
builds as co-ordinates can vary significantly.

3.1.2.2 The UCSC Genome Browser


The UCSC Genome Browser (Kent et al. 2002) is much like the Ensembl genome browser
but contains additional resources such as recombination hotspots and data from the
ENCyclopaedia Of DNA Elements (ENCODE) international consortium (Birney et al. 2007;
ENCODE Project Consortium 2004). The UCSC genome browser provides public access to a
mirrored MySQL database but does not provide an easy to use API, such as Ensembl, and,
as such, it is much harder to run complex queries. Attempts to solve this are the UCSC table
browser, which provides a basic interface to the database allowing simple queries, and
Galaxy (Taylor et al. 2007) which provides a web interface allowing users to carry out more
complex queries. There is currently no integration of UCSC in the Taverna workbench and
although useful resources are available it would be difficult to use this resource extensively.
However, the public UCSC database currently offers access to the four latest genome builds
(for humans) but unfortunately offers no simple web service which allows programmatic
access to convert co-ordinates between them.

3.1.2.3 NCBI Entrez


NCBI Entrez (Sayers et al. 2010) is not a genome browser like the Ensembl or the UCSC
genome browsers (although recently a sequence viewer has been developed which is
similar); it is web based GUI which primarily functions as a portal to the GenBank (Benson
et al. 2010) database but also allows access to other NCBI hosted databases such as dbSNP
(Sherry et al. 2001), PubMed and the Molecular Modeling Database (MMDB) (Wang et al.
2007). NCBI provides a set of programming utilities, Entrez Utilities, which provide
programmatic access to the Entrez system via static Uniform Resource Locators (URLs).
Query information is posted to the relevant Entrez Utilities URL and processed by a server
side programme which returns an XML response. A web service for the Entrez Utilities is
also provided via the Simple Object Access Protocol (SOAP). This web service is fully
integrated into the Taverna workbench which allows workflows to search and retrieve
information from all the available resources. Much like the Ensembl BioMart service, Entrez

44
Utilities doesn’t allow the user to specify the genome build but the resultant XML data does
contain this information and therefore could be used to convert between genomic builds.

3.1.2.4 Genome Resources Summary


All three genome resources share data to a certain degree and therefore any one resource
could be used. Each resource has pros as well as cons with relation to data retrieval and
processing and as such it is important to utilise each resource to its full potential. However,
there is also data which is unique to each resource and therefore care must be used when
accessing and processing information from resources which are likely to be problematic.

3.1.3 Genome Recombination

3.1.3.1 LD Data
The International HapMap Project (The International HapMap Consortium 2003) is the
largest resource dedicated to identifying common human variation patterns and provides a
web based GUI to access the data. For bulk data download, a service based on the BioMart
interface, HapMart, is provided and has recently been integrated into the Taverna
workbench but does not provide access to LD information. LD information is only accessible
through the web based GUI via the use of ‘plugins’. These ‘plugins’ are server side
programmes which can be accessed via a static URL like the NCBI Entrez Utilities resource
however no SOAP is supplied. There is currently no integration for the HapMap plugins in
the Taverna workbench, which must, therefore, be accessed via the standard ‘net’ services.

3.1.3.2 Recombination Hotspots


Several programmes exist which calculate recombination rates and subsequent potential
recombination hotspots from SNP genotype data, for example, Hotspotter (Li & Stephens
2003) and LDhot (McVean et al. 2004). However producing a workflow which requires
installation of one of these programmes will affect the workflows portability and use.

The UCSC Genome Browser (Kent et al. 2002) allows access to pre-computed
recombination hotspot co-ordinates based on the NCBI34/hg17 genome build and would
therefore need to be mapped to the current genome build. Since there is currently no web
resource to convert co-ordinates between builds this will have to been carried out
manually and supplied as a file to the Taverna workbench. This will also affect the
portability of the workflow but to a much lesser extent than the previous option.

45
3.1.4 Pathway Databases and Analysis Methods

3.1.4.1 The KEGG Database


The Kyoto Encyclopaedia of Genes and Genomes (KEGG) (Kanehisa et al. 2006; Kanehisa et
al. 2010; Kanehisa & Goto 2000) is an integrated database divided into three categories
each containing multiple entry points (Table 5). The ‘systems information’ category
contains information on functional aspects of biological systems and includes the
PATHWAY entry point. The KEGG PATHWAY database currently contains a total of 111, 218
pathway maps (Kanehisa Laboratories 2010) which have been manually created from
published literature and existing, manually drawn diagrams. The KEGG provides
programmatic access to the databases through a SOAP API which has been integrated into
the Taverna workbench.

Table 5 The Kyoto Encyclopaedia of Genes and Genomes (KEGG) entry points.
Category Entry Point
Systems information KEGG PATHWAY
KEGG BRITE
KEGG Atlas
KEGG DISEASE
KEGG DRUG
Genomic information KEGG ORTHOLOGY
KEGG GENES
KEGG GENOME
KEGG Organisms
Chemical information KEGG LIGAND
KEGG GLYCAN
KEGG PLANT

3.1.4.2 Biocarta
Biocarta is a commercial company offering molecular biology supplies including polyclonal
antibodies and flow cytometry immune function kits. It also allows access to approximately
300 pathway diagrams submitted and maintained by experts in the particular field. All
Biocarta data is supplied as diagrams and provides no programmatic access and no bulk
download facility. Due to this, it will not be possible to build this database into the
workflow and it will have to be accessed manually.

3.1.4.3 Gene Ontology (GO)


The Gene Ontology (GO) project (Ashburner et al. 2000) aims to describe the roles of genes
and gene products using a structured and precisely defined vocabulary consistent across
species and databases. The vocabulary covers three domains: cellular component,
molecular function and biological process, and is built up of a number of terms, each having
defined relationships to one or more other terms. Using these terms it is possible to
construct a directed acyclic graph (DAG) and infer relationships between genes at multiple
levels. The GO project provides AmiGO: a web based set of tools for browsing the GO

46
database (Carbon et al. 2009). It is also possible to retrieve GO terms for a gene via the
Biomoby services built into the Taverna workbench.

3.1.4.4 PANTHER
The Protein ANalysis THrough Evolutionary Relationships (PANTHER) database (Thomas et
al. 2003) is a resource which classifies genes by their functions. The GO database
(Ashburner et al. 2000) contains thousands of terms for each domain and produces DAGs
several levels deep. Although this provides a high level of detail, a simpler approach could
be beneficial in some applications (Thomas et al. 2003). The PANTHER database attempts
to achieve this, comprising approximately 250 terms per domain and should be no more
than three levels deep. Access to PANTHER is via a web based GUI however no
programmatic access is available and as such the PANTHER database is not easily accessible
via the Taverna workbench.

3.1.4.5 DAVID
The Database for Annotation, Visualization and Integrated Discovery (DAVID) (Dennis, Jr. et
al. 2003; Huang, Sherman, & Lempicki 2009) is a set of data mining tools which allow the
annotation and analysis of large gene lists. DAVID uses information from multiple
databases, which are updated weekly, such as GenBank (Benson et al. 2010), Online
Mendelian Inheritance in Man (OMIM) (McKusick-Nathans Institute of Genetic Medicine &
National Center for Biotechnology Information 2010) and GO (Carbon et al. 2009) and is
comprised of four analysis tools: functional annotation, gene functional classification, gene
ID conversion and gene name batch viewer.

The main tool in the DAVID suite, the functional annotation tool, allows three output
options: table, chart and clustering. The functional annotation table allows users to explore
all available annotations for each gene, such as GO terms and KEGG pathways. Gene-term
enrichment analysis is shown by the functional annotation chart option and allows users to
identify overrepresented terms for the given gene list. An enrichment p-value is calculated
for each term allowing statistical significance to be examined. The third output option,
functional annotation clustering, groups genes into clusters by measuring the relationships
among the annotation terms (Huang, Sherman, & Lempicki 2009).

Gene functional classification groups related genes together concentrating on biological


networks using a set of fuzzy clustering techniques based on annotation terms. DAVID then
allows investigators to study the clusters, which are now easier to assimilate and view the
relationships. To ensure the best possible analysis, it is essential that DAVID maps the gene

47
identifiers supplied comprehensively across the various databases used. DAVID has been
designed to collate a diverse range of gene identifiers and map them to internal DAVID
identifiers, however ≥ 20% of input gene identifiers fail to map successfully to an internal
DAVID identifier (Huang, Sherman, & Lempicki 2009). The gene ID conversion tool helps to
solve this deficit. The final tool, gene name batch viewer, allows users to view further gene
information and provides external links to allow further exploration of each gene.

DAVID allows multiple functional databases to be queried simultaneously reducing the


need to retrieve and assimilate data from multiple resources. Primary access to DAVID is
via a web based GUI which allows the user to upload one or more gene lists and analyse
them together or independently. There is no Taverna workbench integration but a URL
based API is available to access DAVID resources programmatically and could therefore be
integrated into a Taverna workflow using the standard ‘net’ services.

3.1.4.6 InnateDB
InnateDB (Gardy et al. 2009) provides access to a manually curated database of genes,
proteins, interactions and signalling pathways involved in the innate immune response. The
InnateDB resource also integrates information from other major interaction and pathway
databases to help improve coverage. InnateDB currently contains a total of 137,982 human
interactions of which 11,375 have been manually curated (InnateDB 2011). InnateDB allows
interactions to be viewed graphically in Cytoscape (Shannon et al. 2003) using the Cerebral
(Barsky et al. 2007) and CyOog (Royer et al. 2008) plugins which allow interactions to be
viewed using subcellular localisation annotation and power graph analysis respectively.
Access to the InnateDB resource is via a web based GUI although a bulk download option is
available. Due to this, it will not be possible to build InnateDB into the workflow and it will
have to be accessed manually.

3.1.4.7 Reactome
Reactome (Matthews et al. 2009; Vastrik et al. 2007) is a peer reviewed, manually curated
pathway database. It is open source and cross referenced to many other bioinformatic
resources such as Ensembl, UCSC, UniProt and NCBI (Vastrik et al. 2007). The current
release (version 36) of the Reactome database contains annotations on 1116 pathways,
4247 reactions, 3958 complexes and 5234 proteins for human (Reactome 2011). Further
information about each pathway can be viewed using the Reactome identifier. Access to
the Reactome database is via the web based GUI. A bulk download option as well as a SOAP
interface is provided but these do not allow access to Reactome’s pathway over-

48
representation analysis tool. Due to this, it will not be possible to build Reactome into the
workflow and it will have to be accessed manually.

3.1.5 Protein-protein Interaction


The Human Protein Reference Database (HPRD) (Peri et al. 2003) contains protein-protein
interactions (PPIs) for human genes based on experimental evidence from the literature.
Protein-protein interactions are essential as most proteins carry out their function through
a number of interactions with other molecules (Mathivanan et al. 2006). PPIs are a valuable
resource to study these networks and the current release of the HPRD contains 39,240 PPIs
for 9,616 genes. Access to the HPRD is via a web interface or by flat text download. There is
no programmatic analysis and as such the HPRD will have to be accessed manually by
downloading the database.

3.2 Taverna Workflow


3.2.1 Main Workflow Overview
The Taverna workflow employs a method similar to the one used by Raychaudhuri et al.
(2009a) (Figure 10) with minor modifications and enhanced flexibility. Firstly, the SNP
location is ascertained, HapMap information is then used to define an LD region. This is
extended further in both directions to include the nearest recombination hotspot. This
method defines the boundary of the associated region which can then be used to search
the genome databases for all overlapping genes. Figure 11 shows an overview of this
process and also identifies the potential sources of information for each step.

SNP Region - Identify SNP position


NCBI dbSNP Ensembl UCSC HapMap

LD Region - Find furthest SNP upstream and downstream > LD cutoff


HapMap

Associated Region - Extend region to include nearest recombination hotspots in both


directions
UCSC Local File

Associated Genes - Find all overlapping genes


Ensembl UCSC

Figure 11 Overview of the Taverna workflow. A description of each section is given as well as potential
resources for the data.

49
The workflow utilises the Extensible Markup Language (XML) throughout, starting with a
simple XML schema which is built upon during the runtime of the workflow to provide a
structured, well formed document which can be easily read by humans as well as machines.
The input and output schemas differ as workflow parameters, such as r2 cut-off, may be
specified in the input to replace the default values (Table 6) used during workflow
execution, which are then removed from the XML at the beginning of the workflow and do
not take part in the final schema. The input XML schema is shown in Figure 12 and the final
output XML schema is shown in Figure 13a-f, coloured to reflect when the particular
information is added.

Table 6 Additional Taverna workflow parameters. The Taverna workflow parameter name, description and
default value is shown.
Parameter Name Parameter Description Default Value
flankingKB Maximum distance of LD in KB 250
hapmapPlugin HapMap web plugin to obtain LD data LDPhase3Dumper
hapmapPopulation HapMap population to base LD data on CEU
hapmapVersion HapMap version to use Hapmap27_B36
hapmapBuild Genome assembly version which HapMap data 36
is based on
hapmapBaseURL Base URL of HapMap web interface http://hapmap.ncbi.nlm.nih.gov/cgi-
perl/gbrowse
2
rsqCutoff Minimum r cutoff for SNP in LD 0.5
alternateGeneServer Specify to use Ensembl BioMart server to obtain false
gene information instead of biomart.org

Figure 12 Parameters XML document schema.


Defines an XML document suitable for the
main Taverna workflow. Each box represents
an XML element node showing the content
allowed.

50
Figure 13a Taverna workflow output XML document schema overview. Each element on the bottom row is
expanded further in Figure 13b-f. Element headers are coloured according to the section of the workflow
where they are created (see Figure 14 for workflow sections).
Figure 13c Expanded
SnpRegion XML element
schema. See Figure 13a
for figure legend.

Figure 13b Expanded SnpInfo XML element


schema. See Figure 13a for figure legend.

51
Figure 13d Expanded LDRegion XML element schema. See Figure 13a for figure legend.
Figure 13e Expanded HotspotRegion
XML element schema. See Figure 13a
for figure legend.

52
Figure 13f Expanded
AssociatedRegion XML
element schema. See Figure
13a for figure legend.

The workflow is broadly split into six sections (Figure 14):-

1. Parameter Processing (grey) – Workflow parameters, such as r2 cut-off, are parsed


out of the supplied XML input, replacing any default parameters.
2. SNP Info & Region (red) – Defines the location for all associated SNPs and
additionally, basic annotation such as alleles and flanking sequence.
3. LD Region (green) – Retrieves the region defined by LD cut-off.
4. Associated Region (blue) – Extends the LD Region in both directions to include the
nearest recombination hotspots.
5. Associated Genes (orange) – Retrieves all genes overlapping the Associated Region
and stores location and annotation information.
6. Merge SNP Results (grey) – Merges all SNP results and formats the XML output
document.

With the exception of the first section, each section uses the previous sections results to
carry out its specific task. Each section is discussed below.

53
Figure 14 Taverna workflow diagram to define the SNP ‘AssociatedRegion’.
Custom Taverna beanshell scripts are shown as dark orange boxes; nested
workflows as pink boxes and constants as blue boxes. Boxes with a red
border are involved in the conversion of co-ordinates between genome
assemblies. Nested workflows have been collapsed for clarity and selected
ones shown in full in Figure 16, Figure 18 and Figure 19 – Figure 21.

3.2.1.1 Parameter Processing


The input XML document (Figure 12) contains a list of ‘Parameter’ elements which are used
to change the behaviour of several parts of the workflow. The ‘Parameter’ element has an
attribute called ‘name’ to identify which parameter must be replaced. The parameter value
consists of all text between the opening and closing ‘Parameter’ tags.

<AssociatedSNPs>
<AssociatedSNP rsId="rs6910071" pValue="1.00E-299"/>
<AssociatedSNP rsId="rs6457620" pValue="3.60E-186"/>
<AssociatedSNP rsId="rs2476601" pValue="2.30E-98"/>
</AssociatedSNPs>

Figure 15 XML document excerpt showing the basic ‘AssociatedSNPs’ XML


element. Any number of ‘AssociatedSNP’ elements can be supplied with optional
pValue attribute.

Only one ‘Parameter’ element is required, named “associatedSNPs” which stores a basic
‘AssociatedSNPs’ XML document (Figure 15). This basic XML document is then added to
throughout the course of the workflows runtime to generate a document conforming to

54
the schema shown in Figure 13a-f. Several optional parameters, such as r2 cut-off and
HapMap version, can also be supplied and replace the default values at this point (Table 6).

3.2.1.2 SNP Info & Region


This section retrieves information about each SNP such as chromosome, position and
additional annotation information. As shown in Figure 11, all three genomic resources can
be used to obtain this information and as previously discussed; each resource has
individual attributes which make them more suited to different tasks. NCBIs dbSNP
provides access to the most up to date information at all times and has been fully
integrated into the Taverna workbench through NCBIs SOAP based eUtils API collection. It
returns an information rich XML document containing positional information as well as SNP
annotation information which can be easily extracted using the Taverna workbench XML
services. Occasionally, as more information is discovered, two or more SNPs can be merged
into a single SNP ID which can be problematic if a researcher does not know or have the
updated ID. This is automatically identified by eUtils, which returns the document referring
to the latest ID with a complete list of the merge history ensuring the correct information is
used.

To save time later in the workflow, it is also useful at this point to annotate whether the
SNP has been included in the HapMap project as no LD information will be available for
missing SNPs. It is therefore necessary to query HapMap for the SNP in question. This is
been achieved using the HapMart service provided by HapMap; no result indicates the SNP
has not been included in the HapMap project and no LD information is available. If this is
the case, the ‘LD Region’ is defined as the associated SNPs position and the workflow
continues as normal to define the hotspot region.

55
The nested workflow responsible for this
section is shown in Figure 16 and begins
with retrieving the SNP IDs from the
‘AssociatedSNPs’ XML document and
removing any duplicates. The NCBI SNP
XML document summaries
(ftp://ftp.ncbi.nlm.nih.gov/snp/specs/doc
sum_3.1.xsd) are retrieved for all SNPs
and transformed to match an
intermediate schema (Figure 17) using
the Taverna workbench ‘Transform_XML’
service. This removes all information not
required by the workflow, such as
individual SNP submissions and non-
reference location information and
reformats the retained information to aid
subsequent processing. The transformed
XML document is merged with the
‘AssociatedSNPs’ XML document (Figure Figure 16 Nested Taverna workflow to retrieve SNPs
from NCBI. Magenta, purple and green boxes are
16 ‘mergeXML’) and passes on the up to standard Taverna services. Further nested workflows
are shown in the large beige boxes.
date SNP IDs to the ‘checkHapMap’
nested workflow to retrieve whether the SNP is part of the HapMap project. The ‘SnpInfo’
elements ‘isHapMap’ attribute is updated as necessary and the amended ‘AssociatedSNPs’
document is returned along with the genomic build of dbSNP.

Figure 17 Interim XML document


schema. This schema is used to
temporarily store dbSNP information
before integration into the output XML
format (Figure 13b and Figure 13c).

56
3.2.1.3 LD Region
The third section retrieves LD information from the
HapMap project and identifies the furthest SNP
upstream and downstream of the associated SNP
which is in greater LD than the supplied r2 cut-off
(default cut-off is 0.5). A Taverna workbench
beanshell initially adds the required ‘LDRegion’
element to each ‘AssociatedSNP’ document with
minimal information using the associated SNP
location as the initial default LD Region co-ordinates.
A nested workflow (Figure 18) downloads the LD
information from the HapMap project and adds it to
the relevant ‘HapMap’ element. This is then
processed to find the true extent of the LD region Figure 18 Nested Taverna workflow to
define the ‘LDRegion. LD data is
and updates the ‘LDRegion’ element with the correct downloaded from HapMap and processed
to define the extent of SNPs in LD with the
co-ordinates. Additionally, the location of both the 5’ supplied SNP.
and 3’ SNPs are stored (Figure 13d). To enhance speed and reduce the final file size, only
the header lines and LD information referring to the associated SNP is retained in the
output. This removes all other SNP LD information which does not relate to the associated
SNP.

3.2.1.4 Associated Region


This section retrieves the nearest recombination hotspot in both directions from the LD
Region defined previously, extending the LD Region to include the recombination hotspots.
As shown in Figure 11, information on recombination hotspot locations may be obtained
from two sources: the UCSC Genome Browser or a local file. Initially, the locations of the
recombination hotspots were stored in a local XML file which could be easily searched to
find the nearest recombination hotspots. This was due to ease of access and interpretation
and also due to the static nature of the locations. However, it was decided this affected the
overall portability of the workflow and the relative gain of maximising portability
outweighed those of using a local file. It was therefore necessary to use the public UCSC
MySQL database to retrieve recombination hotspot locations. A Taverna workbench
beanshell has been written which handles all structured query language (SQL) queries and
adds the ‘HotspotRegion’ and ‘AssociatedRegion’ elements (Figure 19 & Figure 20) as
necessary.

57
Figure 19 Nested Taverna workflow to define the ‘HotspotRegion’ and ‘AssociatedRegion’. One Taverna
beanshell is responsible for downloading and processing the information from UCSC to define the nearest
recombination hotspots.

3.2.1.5 Associated Genes


The main objective of the workflow is to
obtain all genes implicated by the
associated SNP to allow subsequent
analysis. The ‘Associated Genes’ section
of the workflow uses the start and end
co-ordinates of the previously defined
‘Associated Region’ to find all genes
within and overlapping this region. Two
resources are available to accomplish
this: Ensembl and the UCSC Genome
Browser (Figure 11); both resources
contain the same information but differ
in the approach and ease to retrieve it.
As discussed previously, there is no
integration of the UCSC Genome Browser
into the Taverna workbench and as such
all requests would have to be via their
public MySQL database and SQL which
may prove problematic. Ensembl,
however, is fully integrated into the
Figure 20 Nested Taverna workflow to retrieve all genes
within the defined ‘AssociatedRegion’. A nested Taverna workbench via the BioMart
workflow queries a biomart service and the main nested
workflow integrates this data with the output XML services and as such may be implemented
document.
much easier and cleaner. It was therefore
decided to use Ensembl for this section of the workflow.

Firstly, the ‘Associated Region’ positional information is extracted from the input XML
document, which is then passed to the BioMart service to obtain all overlapping genes for

58
this region. The retrieved gene information is reformatted to resemble the final XML
schema for the ‘AssociatedGenes’ element and integrated into the XML document (Figure
20), thus completing all information retrieval for a SNP.

3.2.1.6 Merge SNP Results


To speed up the execution time of the Taverna workflow, an approach similar to multi-
threading is utilised by the Taverna workbench. This allows subsequent processing to take
place for a particular SNP without having to wait for all SNPs to be processed first. For
example, ten SNPs may be supplied to the workflow, one may require more time to
retrieve and process LD information which would prevent the other nine SNPs from
completing. Using this approach, the other SNPs would progress through the workflow,
leaving less for the workflow to perform towards the end of its execution.

As such, an individual XML document is produced for each SNP which must be merged
together to produce the final output (Figure 14 ‘MergeResults’) at the end of the workflow.
A Taverna workbench beanshell has been written to accomplish this. Merging the results
into a single document does, however, mean that no results can be viewed until all SNPs
are processed and the merge is complete but does make subsequent processing much
easier.

3.2.2 Additional Resources

3.2.2.1 Workflow Utility Services


As all three genomic resources, in addition to the HapMap project, have been used in the
workflow it was necessary to include ‘utility’ services at several points throughout due to
differences in genome assemblies. The first ‘working draft’ of the human genome (Lander
et al. 2001), produced by hierarchical shotgun sequencing, had approximately 150,000 gaps
and additionally, many smaller segments had unknown order and orientation and therefore
required further work to produce a ‘finished’ genome. As techniques have improved, such
as mapping, sequencing and alignment algorithms, this map has been refined to increase
genomic coverage and decrease the number of gaps. All genome resources map their
features to the genome map, assigning them to a chromosome and position. After
substantial modifications to a genome have taken place a new genome assembly is
released, for example, the latest GRCh37 genome assembly replaced the NCBI36 assembly
in March 2009 and contains ~350 gaps. Therefore, features mapped to NCBI36 could have
substantially different locations to GRCh37 which would cause confusion when moving
between assemblies and therefore co-ordinates need to be converted to the correct
assembly.

59
To convert co-ordinates between genome assemblies, a modified version of the UCSC
Genome Browsers liftover tool, for converting between genome assemblies, has been
written and implemented in a Taverna beanshell. The UCSC Genome Browsers liftover tool
uses a ‘chain’ file, specific for the assembly conversion to be carried out, which allows the
programme to calculate the new positions. The Taverna ‘liftover’ workflow also requires
this knowledge and a separate workflow retrieves the correct file(s) from the UCSC
Genome Browsers download site and converts it from plain text to a machine readable,
searchable XML document which is supplied to the liftover workflow with additional
parameters to perform the conversion. After the conversion has taken place, a new
‘Location’ element is added, storing the new assembly information. Table 7 shows the
current, easily accessible genome assemblies for each of the four resources used and Figure
14 highlights the various points in the workflow where a conversion must take place. To
ensure the workflow remains robust and to increase speed, genome assembly information
is retrieved from each genome resource (where appropriate) and used to retrieve the
appropriate chain file.

Table 7 Available genome assemblies for each of the four main genome resources. The default assembly is
shown and any additional ones available. The Genome Reference Consortium (GRC) assembly naming is
shown with the UCSC name in brackets.
Ensembl HapMap NCBI dbSNP UCSC1
Default Assembly GRCh37 (hg19) NCBI36 (hg18) GRCh37 (hg19) GRCh37 (hg19)
Additional NCBI36 (hg18),
Assemblies NCBI35 (hg17)
1
Recombination hotspot information is only available on genome assembly NCBI35 (hg17).

3.2.2.2 Additional Utility Workflows


Since XML is a largely unknown or unfamiliar language to many research scientists it may
prove difficult to execute the workflow and/or interpret the results. To try and ensure
maximum usability of the workflow two additional utility workflows have been developed:
‘Make AssociatedSNPs XML’ and ‘AssociatedSNPs XML to Gene List’.

The ‘Make AssociatedSNPs XML’ workflow (Figure 21) allows users to supply a list of SNP
IDs, with optional p value, and an optional list of parameter name value doublets, and
produces the correctly formatted input XML required by the main workflow. The second
utility workflow, ‘AssociatedSNPs XML to Gene List’, processes the XML output of the main
workflow and outputs a list of genes with optional additional information such as SNP ID
and p value. Due to the high impact of the HLA region there is also the option to remove
any genes in this region. This produces an output amenable to the pathway analysis
programmes whilst still allowing access to the complete information obtained by the
workflow.

60
Figure 21 Make ‘AssociatedSNPs’ XML workflow
– additional utility workflow to take a list of
associated SNPs and produce an XML file
suitable for the main Taverna Workflow (Figure
12).

61
3.3 Disease Analysis – RA
3.3.1 Pathway Analysis

3.3.1.1 Introduction
To identify potential influential genes or pathways for RA, a gene list must first be produced
representing the known SNP associations. Associated SNPs were defined as the most
associated SNP, attaining a p-value ≤ 1 x 10-4, taken from five RA articles published between
October 2008 and August 2010 (Table 8). The list was divided into highly associated (p-
value < 1 x 10-7) and nominally associated (p-value ≥ 1 x 10-7 and < 1 x 10-4) SNPs which
resulted in 30 highly associated SNPs which can be expanded to 58 SNPs by including
nominally associated SNPs (Table 9).

Table 8 Five RA publications from 2008 to 2010 used to create associated SNP list. The publication label is
used to identify which publication has been used in further tables.
Publication Label Reference
Stahl 2010 Stahl E. et al. (2010) Genome-wide association study meta-analysis identifies seven new rheumatoid
arthritis risk loci. Nature Genetics. 42(6):508-14.
Plant 2010 Plant D. et al. (2010) Investigation of potential non-HLA rheumatoid arthritis susceptibility loci in a
European cohort increases the evidence for nine markers. Annals of the rheumatic diseases. 69:1548-
53.
Raychaudhuri 2009 Raychaudhuri S. et al. (2009) Genetic variants at CD28, PRDM1 and CD2/CD58 are associated with
rheumatoid arthritis risk. Nature Genetics. 41(12):1313-8.
Gregersen 2009 Gregersen P. et al. (2009) REL, encoding a member of the NF-κB family of transcription factors, is a
newly defined risk locus for rheumatoid arthritis. Nature Genetics. 41(7):820-3.
Raychaudhuri 2008 Raychaudhuri S. et al. (2008) Common variants at CD40 and other loci confer risk of rheumatoid
arthritis. Nature Genetics. 40(10):1216-23.

Both lists were used as the input to the main Taverna workflow described above, producing
two XML documents: confirmed_RA_results.xml and expanded_RA_results.xml. These files
were in turn used as input to the ‘AssociatedSNPs XML to Gene List’ workflow to produce
various files for pathway enrichment analysis (Table 10). After initial analysis of pathway
enrichment results, it was found that genes in the major histocompatibility complex (MHC)
had a large influence on certain pathways as the region is extremely rich in genes with a
strong immune function which are unlikely to all be causal in RA. It was therefore decided
to produce two further gene group lists to exclude genes in the MHC region, defined by the
location: chromosome 6, between positions 28,477,797 and 33,448,354, to observe the
overall impact of the MHC region genes. For clarity of presentation, where removing the
MHC region genes had no substantial effect, these results are provided as supplementary
information (chapter 7, pages 124-223).

62
Table 9 Associated RA SNPs. Data taken from five RA publications between 2008 and 2010 (Table 8).
-7
Confirmed loci are defined as having an ‘Associated P-value’ < 1 x 10 , while expanded loci are defined as >=
-7 -4
1 x 10 and < 1 x 10 . The most highly associated p-value across all publications appears in bold type.
Publication Label
Candidate
SNP Stahl 2010 Plant 2010 Raychaudhuri Gregersen Raychaudhuri Associated
Genes
2009 2009 2008 P-value
rs6910071 HLA-DRB1 1.00E-299 1.00E-299
(*0401 tag)
rs6457620 HLA-DRB1*04 3.60E-186 3.60E-186
rs2476601 PTPN22 9.10E-74 2.30E-98 5.70E-421 2.30E-98
rs874040 RBPJ 1.00E-16 1.00E-16
rs6920220 TNFAIP3 8.90E-13 1.08E-15 1.50E-09 1.08E-15
rs7574865 STAT4 2.90E-07 2.99E-15 2.99E-15
rs11676922 AFF3 1.00E-14 1.00E-14
rs13031237 REL 7.90E-07 3.08E-14 3.08E-14
rs2900180 TRAF1,C5 6.50E-13 6.50E-13
rs13017599 REL 2.60E-12 2.60E-12
rs10760130 TRAF1,C5 6.03E-12 6.03E-12
rs6859219 ANKRD55 9.60E-12 9.60E-12
rs706778 IL2RA 1.40E-11 1.40E-11
rs3093023 CCR6 1.50E-11 1.50E-11
rs10488631 IRF5 4.200E-11 4.20E-11
rs951005 CCL21 3.90E-10 3.90E-10
rs5029937 TNFAIP3 7.50E-08 4.62E-10 4.62E-10
rs934734 SPRED2 5.30E-10 5.30E-10
rs4810485 CD40 2.80E-09 6.07E-10 8.20E-09 6.07E-10
rs11586238 CD2,CD58 1.00E-05 1.00E-09 1.00E-09
rs1980422 CD28 5.20E-05 1.30E-09 1.30E-09
rs212389 TAGAP 2.70E-092 2.70E-09
rs1678542 KIF5A,PIP4K2C 2.00E-04 3.64E-09 8.80E-08 3.64E-09
rs2736340 BLK 1.50E-05 5.69E-09 5.69E-09
rs231735 CTLA4 6.25E-09 6.25E-09
rs3087243 CTLA4 1.20E-08 4.08E-08 6.00E-04 1.20E-08
rs548234 PRDM1 9.70E-05 2.10E-08 2.10E-08
rs26232 GIN1,C5orf30 4.10E-08 4.10E-08
rs13315591 PXK,FAM107A, 4.60E-08 4.60E-08
DNASE1L3
rs6822844 IL2,IL21 7.00E-04 5.89E-08 5.89E-08
rs3890745 MMEL1, 3.60E-06 1.10E-07 1.10E-07
TNFRSF14
rs7155603 BATF 1.10E-07 1.10E-07
rs1160542 AFF3 1.15E-07 1.15E-07
rs4750316 PRKCQ 2.00E-06 1.54E-07 4.40E-06 1.54E-07
rs3761847 TRAF1,C5 2.10E-07 2.10E-07
rs13207033 TNFAIP3 2.52E-07 2.52E-07
rs2812378 CCL21 1.00E-04 7.79E-07 2.80E-07 2.80E-07
rs394581 TAGAP 6.00E-04 3.80E-07 3.80E-07
rs10919563 PTPRC 2.00E-04 6.70E-07 6.70E-07
rs13119723 IL2,IL21 6.80E-07 6.80E-07
rs2872507 ORMDL3,IKZF3 9.40E-07 9.40E-07
rs743777 IL2RB 1.44E-06 1.44E-06
rs840016 CD247 1.60E-06 1.60E-06
rs10865035 AFF3 2.00E-06 2.00E-06
rs11203203 UBASH3A 3.80E-06 3.80E-06
rs540386 RAG1,TRAF6 3.00E-04 3.90E-06 3.90E-06
rs42041 CDK6 4.00E-06 4.00E-06
rs3184504 SH2B3 6.00E-06 6.00E-06
rs7543174 IL6R 1.20E-05 1.20E-05
rs2793108 ZEB1 1.40E-05 1.40E-05
rs12746613 FCGR2A 4.00E-04 1.50E-05 1.50E-05
rs8045689 NFATC2IP 2.40E-05 2.40E-05
rs2104286 IL2RA 2.00E-03 2.48E-05 2.48E-05
rs1167223 OLIG3,TNFAIP 3.80E-05 3.80E-05
3
rs892188 ICAM1,ICAM3 4.30E-05 4.30E-05
rs7234029 PTPN2 4.40E-05 4.40E-05
rs5754217 UBE2L3 4.80E-05 4.80E-05
rs4535211 PLCL2 8.90E-05 8.90E-05
1
rs6679677 reported which is r2=1 with rs2476601. 2Taken from supplementary table 3; used in conditional analysis

63
3.3.1.2 Gene Ontology – BiNGO

expanded_ ensgs_pvals_no_HLA.txt
expanded_genes_pvals_no_HLA.txt
The BiNGO tool was used to test for over-

expanded_ ensgs_no_HLA.txt
expanded_genes_no_HLA.txt
Table 10 Files produced from the Taverna workflow output. The filename for each file is given and can be found on the supplementary media in the Results section. representation of the full human gene ontology (GO)
dataset among the gene lists using the
Expanded no HLA
hypergeometric test and the Benjamini & Hochberg
FDR correction method. Associated gene names,
obtained by the Taverna workflow, for all four gene
expanded_ ensgs_pvals.txt
expanded_genes_pvals.txt

lists (Table 10) were supplied to BiNGO. Network


expanded_genes.txt

expanded_ensgs.txt

maps were produced (Figure 22) which were


visualised in the Cytoscape programme to view the
Expanded

relationships between GO terms. A results table was


also produced, using a p-value cut-off of 0.05, showing
confirmed_genes_pvals_no_HLA.txt

confirmed_ensgs_pvals_no_HLA.txt

the GO terms over-represented, the genes which are


confirmed_genes_no_HLA.txt

confirmed_ensgs_no_HLA.txt

represented and statistical information such as p-


value (Figure 23).
Confirmed no HLA

3.3.1.3 DAVID
Ensembl gene ID lists were uploaded into the
Database for Annotation, Visualization and Integrated
Discovery (DAVID) 6.7 and any unrecognised IDs were
confirmed_genes_pvals.txt

confirmed_ensgs_pvals.txt

removed from the analysis. Each gene list was initially


confirmed_genes.txt

confirmed_ensgs.txt

analysed for gene functional classification using


Confirmed

medium stringency and default options. The


SNP Set

functional annotation analysis was performed using


the default annotation sources which, among others,
Tab delimited file of gene ENSG IDs with
names with p-value of associated SNP
Tab delimited file of associated gene

allowed access to the KEGG and Biocarta pathway


databases. The functional annotation chart was used
p-value of associated SNP

to asses which, if any, functional annotation


Ensembl gene ENSG IDs
Associated gene names

categories were statistically enriched in each of the


Description

gene lists. Functional annotation clustering was then


performed on each list to identify enriched annotation
clusters using medium stringency and default options.
Gene Names &
Content Type

Gene Names

Results were exported for subsequent analysis.


ENSG IDs &
ENSG IDs
p-values

p-values

64
Figure 22 Example BiNGO network map visualised in Cytoscape. GO terms are coloured according to its
significance.

Figure 23 Example BiNGO text output as viewed from Cytoscape. Information may be exported as a tab-
delimited file.

3.3.1.4 InnateDB
Ensembl gene IDs were uploaded into InnateDB and analysed using the ‘Pathways’ and
‘Predicted TF Interactions’ options. A pathway over-representation analysis was performed
using the Hypergeometric algorithm and p values were corrected using the recommended,
less conservative Benjamini & Hochberg FDR method. Statistically significantly over-

65
represented pathways were viewed using the
Cerebral Cytoscape plugin. A transcription factor
binding site (TFBS) over-representation analysis was
also conducted using the same algorithm and
correction method.

3.3.1.5 PANTHER
The compare gene lists tool was used to search the
PANTHER database (version 7) for cellular
components, protein classes and pathways over-
represented in the gene lists. A bar chart showing
the percentage of genes in each list was also
produced and allows a visual overview of all
categories analysed. Significant pathways were
viewed using the PANTHER databases internal
viewer which could be used to imply further genes
in the pathway.

3.3.1.6 Reactome
An over-representation pathway analysis was
performed for each gene list using the Ensembl
gene IDs using Reactome version 35. Results were
exported as a tab delimited file (Figure 24) and
further investigation was carried out using the
Reactome identifier.

3.3.2 Protein-protein Interaction


The Human Protein Reference Database (HPRD)
release 9 041310 was downloaded from the HPRD
download page and contained 39,240 protein-
Figure 24 Example Reactome output

protein interactions, representing 9,616 unique


genes. A Perl script was written to process these
interactions and produce a file containing protein-
protein interactions for each supplied gene list,
where both interacting proteins were present in the
list, which could be used to create a network map.

66
Additionally, an extended interaction network map was produced for genes with less than
two interactions between genes in the list. For example, if geneA and geneB are present in
the gene list and gene A interacts with geneC and geneC interacts with geneB then the
geneA-C-B interaction is included. However, if geneC interacted with geneD and geneD
interacted with geneB, the interaction would not be included. Network maps were
constructed and visualised in Cytoscape.

3.3.3 Comparison with Previous Publications


Genes which were identified as significantly associated with RA in previously published
pathway analysis studies (Table 3) were initially compared to genes identified by the
Taverna workflow and subsequently to genes identified by the extended protein-protein
interaction network. The comparison was performed using a custom Perl script and genes
were annotated to show whether they are found in the MHC region. Pathways identified by
previous studies were compared to those identified in this study by visual inspection as
similar pathways can often be inconsistently named which a scripted comparison would be
unable to detect.

67
4 Results
4.1 Taverna Workflow SNP Gene Mapping
Genes obtained by the Taverna workflow using the methods described were compared to
the genes assigned by the ‘most biologically plausible’ method (Table 11). Of the 58 SNPs
associated with RA, the workflow was unable to identify any genes for 3 SNPs; 1 from the
30 confirmed SNPs and 2 from the expanded SNP list. The SNP from the confirmed SNP set,
rs548234, was originally assigned to the gene PRDM1 and upon further investigation it was
found that the region defined for this SNP by the workflow was located approximately 1KB
downstream from the genes 3’ end. For the other two SNPs, rs3890745 (MMEL1) and
rs12746613 (FCGR2A), the workflow failed to identify a region as it was unable to map the
‘LDRegion’ to the required genome assemblies. These SNPs have been removed from
subsequent analyses as it is possible they encompass repetitive or incomplete regions of
the genome and as such any annotation obtained may be inaccurate. For example, the
genomic region (Figure 25) containing rs3890745 contains a gap with unfinished sequence
approximately 80KB from the MMEL1 gene between the contigs AL139246 and AL592464
which could cause problems during assembly co-ordinate conversion.

Figure 25 Genomic region containing rs3890745 (red line). A gap between the contigs AL139246 and
AL592464 is shown (hashed lines) approximately 80KB from the MMEL1 gene.

Of the remaining confirmed SNPs, the workflow did not identify the biologically assigned
gene for two (rs6910071, assigned to HLA-DRB1, and rs6920220, assigned to TNFAIP3) and
failed to identify CD58 among 6 other genes, including CD2, for rs11586238. In the
expanded SNP set, the workflow did not identify the biologically assigned genes for three
SNPs: rs13207033, rs2793108 and rs1167223. Two of these SNPs (rs13207033 and
rs1167223) map to 6q23; the same region as the confirmed rs6920220 SNP assigned to the
TNFAIP3 gene and for the remaining SNP, rs2793108, the workflow defined a region almost
127KB upstream of the biologically assigned ZEB1 gene.

68
Overall a total of 235 genes were identified for the 30 confirmed SNPs and 200 genes for
the additional 28 SNPs in the expanded set, of which 37 lie within the MHC region as
previously defined (Table 12). Approximately 70% are known genes and 53% have HUGO
Gene Nomenclature Committee (HGNC) symbols with the remainder having mostly
(approximately 38%) clone based gene names. Most genes (approximately 52%) are protein
coding and approximately 14% represent RNA genes, such as miRNA or snRNA genes. Full
breakdowns of each gene attribute are shown in Table 13 - Table 15.

69
Table 11 Comparison of genes identified by the Taverna workflow with the most biologically plausible gene(s) from the RA publications (Table 8). ‘Full’ indicates all genes identified in the
publications were identified by the Taverna workflow, ‘Partial’ states at least one but not all were identified and ‘No’ means no genes were identified.
SNP Candidate P-value Number of Pathway Genes Contains
Genes Genes Candidates
Confirmed rs6910071 HLA-DRB1 1.00E-299 9 XXbac-BPG154L12.4, HNRNPA1P2, AL662796.2, HLA-DRA, AL662796.1, NOTCH4, C6orf10, BTNL2, U6.939 No
(*0401 tag)
rs6457620 HLA-DRB1*04 3.60E-186 30 HLA-DRB9, HLA-DRB6, AL713966.1, XXbac-BPG254F23.5, XXbac-BPG254F23.6, AL662789.1, XXbac-BPG254F23.7, HLA- Full
DQB3, HLA-DRA, AL662796.1, HLA-DRB5, HLA-DRB1, HLA-DQA1, HLA-DQB1, HLA-DQA2, HLA-DQB2, HLA-DOB, TAP2,
XXbac-BPG246D15.9, U1.110, AL669918.2, XXbac-BPG246D15.8, AL669918.1, HLA-Z, PSMB8, PSMB9, TAP1, HLA-DMB,
XXbac-BPG181M17.5, AL645941.1
rs2476601 PTPN22 2.30E-98 15 RP11-473L1.1, RP4-730K3.3, RP5-1073O3.2, RP5-1073O3.5, RP5-1073O3.7, RP4-590F24.2, MAGI3, PHTF1, RSBN1, Full
PTPN22, BCL2L15, AP4B1, DCLRE1B, HIPK1, OLFML3
rs874040 RBPJ 1.00E-16 3 AC097714.1, RBPJ, RP11-324H7.1 Full
rs6920220 TNFAIP3 1.08E-15 1 RP11-95M15.1 No
rs7574865 STAT4 2.99E-15 3 AC067945.4, STAT4, U6.690 Full
rs11676922 AFF3 1.00E-14 4 AC092667.2, AC104782.3, AFF3, LONRF2 Full
rs13031237 REL 3.08E-14 22 AC010733.2, AC010733.4, AC010733.7, AC010733.8, AC010733.5, AC016747.3, AC016894.1, AC016727.3, REL, PUS10, Full
AC010733.1, PEX13, KIAA1841, C2orf74, AHSA2, USP34, XPO1, 5S_rRNA.269, AC011245.1, SNORA70B, U4.19, U6.534
rs2900180 TRAF1, C5 6.50E-13 9 RP11-525G7.2, RP11-477J21.5, GSN, PHF19, TRAF1, C5, CEP110, RAB14, AL137068.1 Full
rs13017599 REL 2.60E-12 22 AC010733.2, AC010733.4, AC010733.7, AC010733.8, AC010733.5, AC016747.3, AC016894.1, AC016727.3, REL, PUS10, Full
70

AC010733.1, PEX13, KIAA1841, C2orf74, AHSA2, USP34, XPO1, 5S_rRNA.269, AC011245.1, SNORA70B, U4.19, U6.534
rs10760130 TRAF1, C5 6.03E-12 9 RP11-525G7.2, RP11-477J21.5, GSN, PHF19, TRAF1, C5, CEP110, RAB14, AL137068.1 Full
rs6859219 ANKRD55 9.60E-12 4 RPL17P22, ANKRD55, 5S_rRNA.315, U6.1320 Full
rs706778 IL2RA 1.40E-11 9 RP11-536K7.5, RP11-414H17.2, RP11-414H17.7, RP11-414H17.6, IL2RA, RBM17, PFKFB3, 7SK.63, SNORA14.1 Full
rs3093023 CCR6 1.50E-11 9 AL133458.1, RP1-167A14.2, RP11-514O12.4, RNASET2, FGFR1OP, CCR6, AL121935.1, GPR31, TCP10L2 Full
rs10488631 IRF5 4.20E-11 17 AC011005.1, KCP, RP11-309L24.6, RP11-128A6.2, TPI1P2, RP11-128A6.3, RP11-286H14.2, RP11-286H14.1, RP11- Full
286H14.3, AC011005.2, RP11-286H14.4, IRF5, TNPO3, TSPAN33, AC025594.1, AC025594.2, snoU13.218
rs951005 CCL21 3.90E-10 9 RP11-392A14.1, GLULP4, YWHAZP6, RP11-392A14.7, CCL21, RP11-195F19.10, C9orf144, AL589645.1, KIAA1045 Full
rs5029937 TNFAIP3 4.62E-10 6 RP11-356I2.2, RP11-356I2.4, RP11-10J5.1, RP11-240M16.1, RP11-240M16.2, TNFAIP3 Full
rs934734 SPRED2 5.30E-10 4 AC012370.2, AC074391.1, AC012370.3, SPRED2 Full
rs4810485 CD40 6.07E-10 4 RPL13P2, SLC12A5, NCOA5, CD40 Full
rs11586238 CD2, CD58 1.00E-09 7 GAPDHP64, NEFHP1, IGSF3, C1orf137, CD2, AL355794.1, MIR320B1 Partial
rs1980422 CD28 1.30E-09 7 AC125238.2, AC125238.3, AC125238.4, AC010138.3, CD28, AC125238.1, U6.532 Full
rs212389 TAGAP 2.70E-09 7 yR211F11.2, RP1-111C20.3, RP11-13P5.1, C6orf99, RSPH3, TAGAP, U6.922 Full
rs1678542 KIF5A, 3.64E-09 22 AC025165.1, INHBE, GLI1, ARHGAP9, MARS, DDIT3, MBD6, DCTN2, KIF5A, AC022506.1, PIP4K2C, DTX3, ARHGEF25, Full
PIP4K2C SLC26A10, B4GALNT1, OS9, AGAP2, AC025165.8, RP11-571M6.1, MIR616, snoU13.70, U6.1269
rs2736340 BLK 5.69E-09 7 RP11-148O21.4, NCRNA00208, C8orf12, FAM167A, BLK, RP11-148O21.3, RP11-148O21.2 Full
rs231735 CTLA4 6.25E-09 9 AC125238.2, AC125238.3, AC125238.4, AC010138.3, CD28, CTLA4, ICOS, AC125238.1, U6.532 Full
rs3087243 CTLA4 1.20E-08 9 AC125238.2, AC125238.3, AC125238.4, AC010138.3, CD28, CTLA4, ICOS, AC125238.1, U6.532 Full
rs548234 PRDM1 2.10E-08 0 No
rs26232 GIN1, 4.10E-08 6 CTC-425H14.1, CTC-503K11.2, PAM, GIN1, PPIP5K2, C5orf30 Full
C5orf30
rs13315591 PXK, 4.60E-08 16 AC098479.1, RP11-456N14.2, RP11-359I18.1, RP11-475O23.2, RP11-475O23.3, FLNB, DNASE1L3, ABHD6, RPP14, RP11- Full
FAM107A, 80H18.3, PXK, PDHB, KCTD6, ACOX2, FAM107A, FAM3D
DNASE1L3
rs6822844 IL2, IL21 5.89E-08 5 AC053545.3, KIAA1109, ADAD1, IL2, IL21 Full
Expanded rs38907451 MMEL1, 1.10E-07 0 No
TNFRSF14
rs7155603 BATF 1.10E-07 1 BATF Full
rs1160542 AFF3 1.15E-07 4 AC092667.2, AC104782.3, AFF3, LONRF2 Full
rs4750316 PRKCQ 1.54E-07 2 AL137145.1, PRKCQ Full
rs3761847 TRAF1, C5 2.10E-07 9 RP11-525G7.2, RP11-477J21.5, GSN, PHF19, TRAF1, C5, CEP110, RAB14, AL137068.1 Full
rs13207033 TNFAIP3 2.52E-07 1 RP11-95M15.1 No
rs2812378 CCL21 2.80E-07 2 CCL21, RP11-195F19.10 Full
rs394581 TAGAP 3.80E-07 7 yR211F11.2, RP1-111C20.3, RP11-13P5.1, C6orf99, RSPH3, TAGAP, U6.922 Full
rs10919563 PTPRC 6.70E-07 12 RP11-553K8.2, RP11-553K8.5, RP11-553K8.3, RP11-31E23.1, RP11-16L9.1, RP11-16L9.2, RP11-16L9.3, RP11-16L9.4, Full
RP11-382E9.1, PTPRC, MIR181B1, MIR181A1
rs13119723 IL2, IL21 6.80E-07 6 AC097533.1, AC053545.3, KIAA1109, ADAD1, IL2, IL21 Full
rs2872507 ORMDL3, 9.40E-07 18 AC079199.1, NEUROD2, AC087491.2, PPP1R1B, STARD3, TCAP, PNMT, PGAP3, ERBB2, AC079199.2, C17orf37, GRB7, Full
IKZF3 IKZF3, ZPBP2, GSDMB, ORMDL3, AC090844.1, GSDMA
rs743777 IL2RB 1.44E-06 12 LL22NC01-81G9.3, RP5-1170K4.7, RP1-151B14.6, C22orf33, TST, MPST, KCTD17, TMPRSS6, IL2RB, AL022314.1, Full
Y_RNA.817, 7SK.47
71

rs840016 CD247 1.60E-06 3 RP11-104L21.2, POU2F1, CD247 Full


rs10865035 AFF3 2.00E-06 4 AC092667.2, AC104782.3, AFF3, LONRF2 Full
rs11203203 UBASH3A 3.80E-06 3 TMPRSS3, UBASH3A, U6.1202 Full
rs540386 RAG1, TRAF6 3.90E-06 7 AC061999.1, CTD-2119L1.1, PRR5L, TRAF6, RAG1, RAG2, C11orf74 Full
rs42041 CDK6 4.00E-06 5 AC004128.1, AC002454.1, FAM133B, CDK6, U6.453 Full
rs3184504 SH2B3 6.00E-06 28 CUX2, FAM109A, SH2B3, ATXN2, AC137055.2, BRAP, ACAD10, ALDH2, MAPKAPK5, TMEM116, ERP29, NAA25, TRAFD1, Full
C12orf51, RPL6, PTPN11, AC137055.1, C12orf47, RP3-462E2.1, ADAM1, AC003029.4, RP3-521E19.1, AC002979.1,
Y_RNA.202, Y_RNA.127, 7SK.58, 5S_rRNA.508, U7.53
rs7543174 IL6R 1.20E-05 10 RP11-350G8.5, RP11-350G8.7, RP11-61L14.2, RP11-61L14.6, IL6R, SHE, TDRD10, UBE2Q1, CHRNB2, ADAR Full
rs2793108 T1D, ZEB1 1.40E-05 2 ZNF438, RP11-330O11.2 No
rs127466131 FCGR2A 1.50E-05 0 No
rs8045689 NFATC2IP 2.40E-05 35 AC138894.3, AC145285.3, AC109460.2, RP11-57A19.2, RP11-1348G14.1, SBK1, RP11-57A19.3, EIF3CL, AC138894.2, Full
NPIPL1, CLN3, AC138894.1, IL27, NUPR1, CCDC101, SULT1A2, SULT1A1, RP11-1348G14.2, AC145285.1, EIF3C,
AC145285.2, ATXN2L, TUFM, SH2B1, ATP2A1, RABEP2, CD19, NFATC2IP, SPNS1, LAT, AC109460.1, snoU13.207,
snoU13.193, snoU13.36, SNORA43.4
rs2104286 IL2RA 2.48E-05 9 RP11-536K7.5, RP11-414H17.2, RP11-414H17.7, RP11-414H17.6, IL2RA, RBM17, PFKFB3, 7SK.63, SNORA14.1 Full
rs1167223 OLIG3, 3.80E-05 1 RP11-95M15.1 No
TNFAIP3
rs892188 ICAM1, 4.30E-05 15 S1PR2, MRPL4, ICAM1, ICAM4, ICAM5, ZGLP1, FDX1L, RAVER1, ICAM3, TYK2, CDC37, PDE4A, KEAP1, S1PR5, MIR1181 Full
ICAM3
rs7234029 PTPN2 4.40E-05 2 PTPN2, Y_RNA.294 Full
rs5754217 UBE2L3 4.80E-05 41 BCRP2, AP000550.1, KB-1592A4.15, KB-1592A4.13, KB-1183D5.11, KB-1592A4.14, KB-1183D5.9, POM121L8P, BCRP6, Full
AP000552.2, NCRNA00281, KB-1183D5.14, KB-1183D5.15, KB-1183D5.16, TMEM191C, PI4KAP2, POM121L7, GGT2,
AP000552.1, RIMBP3B, HIC2, RIMBP3C, UBE2L3, YDJC, CCDC116, SDF2L1, PPIL2, YPEL1, MIR301B, MIR130B, 7SK.19,
7SK.117, AP000552.3, AP000557.1, AP000553.1, SCARNA18.3, SCARNA17.1, snoU13.456, snoU13.466, SCARNA17.2,
SCARNA18.4
rs4535211 PLCL2 8.90E-05 3 AC091491.2, PLCL2, AC090644.1 Full
1
the workflow failed to identify a region as it was unable to map the ‘LDRegion’ to the required genome assemblies.

Table 12 Number of genes identified by the Taverna workflow. Both gene name Table 14 Gene numbers by gene status. This indicates the validity of the gene,
and Ensembl gene ID (ENSG ID) numbers are shown. for example, known genes have a sequence match external to Ensembl in the
same species, whereas novel genes only have a sequence match to other species.
Confirmed Confirmed no HLA Expanded Expanded no HLA
Gene Names 235 198 435 398 Gene List
ENSG IDs 235 198 436 399 Status
Confirmed Confirmed no HLA Expanded Expanded no HLA
KNOWN 165 70.21% 141 71.21% 299 68.58% 275 68.92%
NOVEL 57 24.26% 47 23.74% 117 26.83% 107 26.82%
PUTATIVE 13 5.53% 10 5.05% 20 4.59% 17 4.26%
Table 13 Gene numbers by gene biotype as defined by Ensembl. The biotype refers to Total 235 100% 198 100% 436 100% 399 100%
the class of gene, for example, if a protein is produced (protein coding), the gene is
72

processed but produces no functional product (processed transcript) or the gene


produces a functional product such as a micro RNA (miRNA).
Table 15 Gene numbers by associated gene database. HGNC Symbols represent unique
Gene List
Gene Biotype approved abbreviations for the gene based on a standardised nomenclature. Clone-
Confirmed Confirmed no HLA Expanded Expanded no HLA
based names, however, are named after the clone ID and can therefore be ambiguously
Protein Coding 120 51.06% 101 51.01% 233 53.44% 214 53.63%
Processed named. Clone-based names often represent novel, uncharacterised or pseudogenes.
Transcript 45 19.15% 36 18.18% 71 16.28% 62 15.54% Associated Gene Gene List
Pseudogene 40 17.02% 35 17.68% 61 13.99% 56 14.04% DB Confirmed Confirmed no HLA Expanded Expanded no HLA
misc RNA 1 0.43% 1 0.51% 9 2.06% 9 2.26% HGNC Symbol 125 53.19% 104 52.53% 240 55.05% 219 54.89%
snRNA 9 3.83% 7 3.54% 12 2.75% 10 2.51% Clone-based (Vega) 71 30.21% 64 32.32% 110 25.23% 103 25.81%
snoRNA 4 1.70% 4 2.02% 14 3.21% 14 3.51% Clone-based
miRNA 6 2.55% 5 2.53% 14 3.21% 13 3.26% (Ensembl) 24 10.21% 17 8.59% 49 11.24% 42 10.53%
rRNA 2 0.85% 2 1.01% 3 0.69% 3 0.75%
RFAM gene name 15 6.38% 13 6.57% 37 8.49% 35 8.77%
scRNA
Pseudogene 3 1.28% 3 1.52% 8 1.83% 8 2.01% Total 235 100% 198 100% 436 100% 399 100%
lincRNA 5 2.13% 4 2.02% 11 2.52% 10 2.51%
Total 235 100% 198 100% 436 100% 399 100%
4.2 Disease Analysis – RA
4.2.1 Pathway Analysis

4.2.1.1 Gene Ontology – BiNGO


Of the 235 genes obtained from the confirmed SNP list, BiNGO retrieved GO annotations
for 97 genes (41.3%) (Supplementary Table 1). The number of genes from the expanded list
with no GO annotations nearly doubled compared to the confirmed list (243). However, as
the total number of genes followed the same pattern, the proportion of genes with GO
annotations increased by approximately 3% to 44.1%. GO annotation results are
summarised in Table 16.

Table 16 Number of genes without GO annotations by gene list.


Gene List Number of Genes Without GO Annotations
With HLA Without HLA
Confirmed 138 58.7% 116 58.6%
Expanded 243 55.9% 221 55.9%

A total of 94 GO terms were found to be over-represented by genes in the confirmed list


and 161 in the expanded list with a corrected p-value of ≤ 0.05. Among the GO terms most
significantly over-represented were antigen processing and presentation and regulation of
proliferation and activation of immune cells. Table 17 and Table 18 show the most
associated (corr p-value <1 x 10-3) GO terms for the confirmed and expanded gene lists
respectively (full results are shown in Supplementary Table 3 & Supplementary Table 5).
Network diagrams for all four gene lists can be found in Supplementary Figure 1 -
Supplementary Figure 4.
-3
Table 17 Most Significant (corr p <1 x 10 ) BiNGO results for the confirmed gene list. Each row shows the
significant GO ID, p-value, p-value after correction for multiple testing, the number of genes observed and in
total for the gene list and all human genes and GO term description. Full results are shown in Supplementary
Table 3.
GO-ID p-value corr x n X N Description
p-value
42613 7.34E-16 1.15E-12 8 13 97 17784 MHC class II protein complex
2504 1.36E-14 8.33E-12 8 17 97 17784 antigen processing and presentation of peptide or
polysaccharide antigen via MHC class II
19882 1.60E-14 8.33E-12 11 59 97 17784 antigen processing and presentation
32395 1.87E-12 7.29E-10 6 9 97 17784 MHC class II receptor activity
42611 1.22E-11 3.80E-09 8 35 97 17784 MHC protein complex
2376 1.01E-09 2.63E-07 23 947 97 17784 immune system process
5765 2.87E-08 6.41E-06 8 89 97 17784 lysosomal membrane
6955 3.59E-08 7.01E-06 17 618 97 17784 immune response
5774 2.45E-07 4.25E-05 8 117 97 17784 vacuolar membrane
44437 3.83E-07 5.98E-05 8 124 97 17784 vacuolar part
45589 6.27E-07 8.32E-05 3 4 97 17784 regulation of regulatory T cell differentiation
45619 6.53E-07 8.32E-05 6 57 97 17784 regulation of lymphocyte differentiation
2683 6.92E-07 8.32E-05 7 92 97 17784 negative regulation of immune system process
50670 8.02E-07 8.95E-05 7 94 97 17784 regulation of lymphocyte proliferation
32944 8.62E-07 8.97E-05 7 95 97 17784 regulation of mononuclear cell proliferation
70663 9.25E-07 9.03E-05 7 96 97 17784 regulation of leukocyte proliferation
42129 1.72E-06 1.58E-04 6 67 97 17784 regulation of T cell proliferation
51249 2.40E-06 2.09E-04 8 158 97 17784 regulation of lymphocyte activation
50863 4.63E-06 3.81E-04 7 122 97 17784 regulation of T cell activation

73
323 5.49E-06 4.08E-04 9 235 97 17784 lytic vacuole
5764 5.49E-06 4.08E-04 9 235 97 17784 lysosome
2694 6.33E-06 4.49E-04 8 180 97 17784 regulation of leukocyte activation
45580 6.86E-06 4.66E-04 5 49 97 17784 regulation of T cell differentiation
50865 1.02E-05 6.61E-04 8 192 97 17784 regulation of cell activation
10008 1.52E-05 9.14E-04 8 203 97 17784 endosome membrane
44440 1.52E-05 9.14E-04 8 203 97 17784 endosomal part
-3
Table 18 Most Significant (corr p <1 x 10 ) BiNGO results for the expanded gene list. Each row shows the
significant GO ID, p-value, p-value after correction for multiple testing, the number of genes observed and in
total for the gene list and all human genes and GO term description. Full results are shown in Supplementary
Table 5.
GO-ID p-value corr x n X N Description
p-value
2504 3.73E-14 5.92E-11 9 17 192 17779 antigen processing and presentation of peptide or
polysaccharide antigen via MHC class II
19882 4.73E-14 5.92E-11 13 59 192 17779 antigen processing and presentation
42613 1.96E-13 1.64E-10 8 13 192 17779 MHC class II protein complex
2682 5.60E-12 3.50E-09 25 425 192 17779 regulation of immune system process
50670 2.53E-11 1.19E-08 13 94 192 17779 regulation of lymphocyte proliferation
32944 2.91E-11 1.19E-08 13 95 192 17779 regulation of mononuclear cell proliferation
70663 3.33E-11 1.19E-08 13 96 192 17779 regulation of leukocyte proliferation
32395 1.20E-10 3.40E-08 6 9 192 17779 MHC class II receptor activity
42129 1.33E-10 3.40E-08 11 67 192 17779 regulation of T cell proliferation
2376 1.36E-10 3.40E-08 35 947 192 17779 immune system process
51249 1.75E-10 3.97E-08 15 158 192 17779 regulation of lymphocyte activation
50863 7.10E-10 1.48E-07 13 122 192 17779 regulation of T cell activation
2694 1.09E-09 2.10E-07 15 180 192 17779 regulation of leukocyte activation
50865 2.67E-09 4.77E-07 15 192 192 17779 regulation of cell activation
42611 2.93E-09 4.89E-07 8 35 192 17779 MHC protein complex
2683 4.39E-09 6.86E-07 11 92 192 17779 negative regulation of immune system process
6955 1.37E-08 2.01E-06 25 618 192 17779 immune response
50671 2.73E-08 3.80E-06 9 64 192 17779 positive regulation of lymphocyte proliferation
32946 3.14E-08 4.14E-06 9 65 192 17779 positive regulation of mononuclear cell proliferation
70665 3.61E-08 4.51E-06 9 66 192 17779 positive regulation of leukocyte proliferation
50776 4.34E-08 5.17E-06 15 236 192 17779 regulation of immune response
45619 1.68E-07 1.91E-05 8 57 192 17779 regulation of lymphocyte differentiation
2684 1.98E-07 2.15E-05 15 265 192 17779 positive regulation of immune system process
42102 3.01E-07 3.14E-05 7 42 192 17779 positive regulation of T cell proliferation
5765 5.00E-07 4.97E-05 9 89 192 17779 lysosomal membrane
2696 5.17E-07 4.97E-05 10 116 192 17779 positive regulation of leukocyte activation
50867 7.65E-07 7.08E-05 10 121 192 17779 positive regulation of cell activation
45580 8.99E-07 8.03E-05 7 49 192 17779 regulation of T cell differentiation
48583 1.09E-06 9.41E-05 20 525 192 17779 regulation of response to stimulus
51251 2.04E-06 1.64E-04 9 105 192 17779 positive regulation of lymphocyte activation
2697 2.04E-06 1.64E-04 9 105 192 17779 regulation of immune effector process
9897 2.55E-06 1.96E-04 10 138 192 17779 external side of plasma membrane
8284 2.58E-06 1.96E-04 18 459 192 17779 positive regulation of cell proliferation
45589 4.92E-06 3.58E-04 3 4 192 17779 regulation of regulatory T cell differentiation
5774 5.01E-06 3.58E-04 9 117 192 17779 vacuolar membrane
44437 8.07E-06 5.61E-04 9 124 192 17779 vacuolar part
48585 9.81E-06 6.63E-04 9 127 192 17779 negative regulation of response to stimulus
50777 1.34E-05 8.83E-04 5 29 192 17779 negative regulation of immune response

74
Figure 26 shows a Venn diagram describing the distribution of GO terms among the four
gene lists. In total 182 GO terms were associated across all four gene lists with a large
overlap seen between all four groups (48) representing over half (51.1%) of the GO terms
identified from the confirmed list. Twenty nine GO terms were shared between the
confirmed and expanded lists that were not present when MHC region genes were
excluded. Supplementary Table 7 shows all 182 GO IDs by overlap category with their
corresponding description and Table 19 show a summary of significant GO terms across all
four gene lists.

Figure 26 Distribution of GO terms in the BiNGO analyses. Venn diagram showing the distribution of GO
terms identified from each gene list.

75
Table 19 GO terms showing significance across all gene lists in the BiNGO analyses.
GO Term GO Term Description GO Term GO Term Description
2682 regulation of immune system process 45590 negative regulation of regulatory T cell
differentiation
2683 negative regulation of immune system 45619 regulation of lymphocyte differentiation
process
2684 positive regulation of immune system 45830 positive regulation of isotype switching
process
2694 regulation of leukocyte activation 46013 regulation of T cell homeostatic
proliferation
2695 negative regulation of leukocyte activation 48302 regulation of isotype switching to IgG
isotypes
2696 positive regulation of leukocyte activation 48304 positive regulation of isotype switching to
IgG isotypes
2697 regulation of immune effector process 48585 negative regulation of response to stimulus
2700 regulation of production of molecular 50670 regulation of lymphocyte proliferation
mediator of immune response
2706 regulation of lymphocyte mediated 50671 positive regulation of lymphocyte
immunity proliferation
5134 interleukin-2 receptor binding 50672 negative regulation of lymphocyte
proliferation
5515 protein binding 50776 regulation of immune response
8284 positive regulation of cell proliferation 50863 regulation of T cell activation
9897 external side of plasma membrane 50865 regulation of cell activation
23052 signalling 50866 negative regulation of cell activation
32880 regulation of protein localization 50867 positive regulation of cell activation
32944 regulation of mononuclear cell proliferation 50868 negative regulation of T cell activation
32945 negative regulation of mononuclear cell 50870 positive regulation of T cell activation
proliferation
32946 positive regulation of mononuclear cell 51023 regulation of immunoglobulin secretion
proliferation
42102 positive regulation of T cell proliferation 51249 regulation of lymphocyte activation
42108 positive regulation of cytokine biosynthetic 51250 negative regulation of lymphocyte
process activation
42129 regulation of T cell proliferation 51251 positive regulation of lymphocyte
activation
42130 negative regulation of T cell proliferation 70663 regulation of leukocyte proliferation
45580 regulation of T cell differentiation 70664 negative regulation of leukocyte
proliferation
45589 regulation of regulatory T cell 70665 positive regulation of leukocyte
differentiation proliferation

Of the 48 GO terms found to be significantly over-represented in all gene lists, around 80%
are directly involved in the immune system and 20% involve the regulation of T cell
activation, proliferation and differentiation. Among the more specific immune terms were
IL-2 receptor binding and regulation of IgG isotype switching, while terms not directly
involving the immune system included protein binding and localisation, cell proliferation
and signalling, which all have the potential to be involved in immune system processes.

4.2.1.2 DAVID
For all four gene lists, DAVID was able to identify approximately 5-10% more genes in its
database from the Ensembl ENSG IDs than the ‘Associated Gene Names’ but still failed to
identify between 37 and 43% of the implicated genes (Table 20).

76
Table 20 Summary of genes mapped by DAVID.
Gene List Associated Gene Names Ensembl Gene IDs
Mapped Unmapped Mapped Unmapped
Confirmed 120 51.06% 115 48.9% 134 57.0% 101 43.0%
Confirmed no HLA 99 50% 99 50% 118 59.6% 80 40.4%
Expanded 232 53.3% 203 46.7% 266 61.0% 170 39.0%
Expanded no HLA 211 53.0% 187 47.0% 250 62.7% 149 37.3%

The genes implicated by the Taverna workflow from the confirmed SNPs were grouped into
four functional classifications (Table 21 & Supplementary Table 8): an MHC class II cluster, a
T-cell co-stimulatory cluster, a cluster containing INHBE, OLFML3, Fam3d and RNASET2 and
a cluster largely comprised of hypothetical and un-named genes. However, 88 genes
(65.7%) did not belong to any of these groups. As expected, the functional classification
analysis of the expanded gene list resulted in more functional classification clusters but
with much lower enrichment scores (< 1). Six clusters were identified in total: the same
INHBE, OLFML3, Fam3d and RNASET2 cluster as the confirmed lists, a kinase rich cluster, an
MHC class II/receptor cluster, a ubiquitin-BRCA1 cluster, a group largely comprised of
transcription factors and a zinc finger rich cluster (Table 22 & Supplementary Table 10).

Table 21 DAVID functional classification results of the confirmed gene list.


Gene Enrichment Gene Description
Group Score
1 3.621 MHC, class II, DQ alpha 1
MHC, class II, DR beta 4; MHC, class II, DR beta 1
inducible T-cell co-stimulator
G protein-coupled receptor 31
MHC, class II, DR alpha
immunoglobulin superfamily, member 3
major histocompatibility complex, class II, DQ beta 1
butyrophilin-like 2 (MHC class II associated)
tetraspanin 33
MHC, class II, DR beta 5
2 2.495 inducible T-cell co-stimulator
CD28 molecule
cytotoxic T-lymphocyte-associated protein 4
CD2 molecule
3 1.562 inhibin, beta E
olfactomedin-like 3
family with sequence similarity 3, member D
ribonuclease T2
4 0.447 KIAA1109
hypothetical protein LOC259308; chromosome 9 open reading frame 144
chromosome 6 open reading frame 10
tetraspanin 33
abhydrolase domain containing 6

77
Table 22 DAVID functional classification results of the expanded gene list.
Gene Enrichment Gene Description
Group Score
1 0.948 Same as confirmed cluster 3
2 0.930 mitogen-activated protein kinase-activated protein kinase 5
B lymphoid tyrosine kinase
histidine acid phosphatase domain containing 1
homeodomain interacting protein kinase 1
tyrosine kinase 2
SH3-binding domain kinase 1
6-phosphofructo-2-kinase/fructose-2,6-biphosphatase 3
phosphatidylinositol-5-phosphate 4-kinase, type II, gamma
cyclin-dependent kinase 6
3 0.928 transmembrane protein 191B; transmembrane protein 191C
immunoglobulin superfamily, member 3
CD19 molecule
major histocompatibility complex, class II, DQ alpha 1
ORM1-like 3 (S. cerevisiae)
CD2 molecule
transmembrane protein 116
G protein-coupled receptor 31
intercellular adhesion molecule 4 (Landsteiner-Wiener blood group)
intercellular adhesion molecule 5, telencephalin
chromosome 6 open reading frame 10
hypothetical protein LOC259308; chromosome 9 open reading frame 144
sphingosine-1-phosphate receptor 5
abhydrolase domain containing 6
butyrophilin-like 2 (MHC class II associated)
sphingosine-1-phosphate receptor 2
tetraspanin 33
MHC, class II, DR alpha
cytotoxic T-lymphocyte-associated protein 4
KIAA1109
MHC, class II, DR beta 4; MHC, class II, DR beta 1
intercellular adhesion molecule 3
MHC, class II, DR beta 5
interleukin 2 receptor, beta
nuclear pore complex interacting protein-like 1; olfactory receptor, family 51, subfamily A,
member 4
MHC, class II, DQ beta 1
inducible T-cell co-stimulator
4 0.826 chromosome 12 open reading frame 51
BRCA1 associated protein
ubiquitin-conjugating enzyme E2L 3
ubiquitin-conjugating enzyme E2Q family member 1
5 0.335 basic leucine zipper transcription factor, ATF-like
neurogenic differentiation 2
AF4/FMR2 family, member 3
interferon regulatory factor 5
putative homeodomain transcription factor 1
6 0.104 TRAF-type zinc finger domain containing 1
POU class 2 homeobox 1
recombination signal binding protein for immunoglobulin kappa J region
hypermethylated in cancer 2
GATA like protein-1
zinc finger protein 438
cut-like homeobox 2
KIAA1045
deltex homolog 3 (Drosophila)
PHD finger protein 19
IKAROS family zinc finger 3 (Aiolos)

A total of 191 functional annotation categories were significantly enriched (p < 0.01)
representing 96 of the mapped genes for the confirmed gene list (Supplementary File 1)
and 351 functional annotation categories representing 207 genes for the expanded gene

78
list (Supplementary File 3). These are mostly comprised of immune system related terms
with the majority (approximately 58%) of annotations derived from GO terms.

Functional annotation clustering of the confirmed gene list resulted in 33 clusters with
enrichment scores ranging from 0.015 – 4.43 (Supplementary File 5). The most enriched
cluster was comprised of 40 annotation terms mostly involved with the MHC class II
immune response. Using the expanded gene list, 85 functional annotation clusters were
identified with enrichment scores ranging from 0.014 – 3.692 (Supplementary File 7). With
the exception of two terms (‘transmembrane protein’ and ‘topological
domain:Extracellular’) which were not identified when using the expanded list, the most
enriched cluster identified contained identical terms as the most enriched cluster from the
confirmed list (although individual term significance varied). The largest cluster identified
from both gene lists was involved in the regulation of immune system processes. However,
many annotation terms (47%) did not appear in any of the 30 clusters possibly representing
spurious genes identified by the Taverna workflow.

4.2.1.3 InnateDB
InnateDB identified 310 pathways (Supplementary File 9) from 6 databases for genes in the
confirmed list representing 97 genes. However, only 72 were significantly associated (p <
0.05), reducing to 27 after Benjamini Hochberg correction (p < 0.05) (Table 24,
Supplementary Table 15). InnateDB failed to identify 41 genes and contained no pathway
associations for 138 genes (Table 23, Supplementary Table 13 & Supplementary Table 14).
Among the most highly enriched pathways were disease specific pathways (e.g. T1D,
Systemic Lupus Erythematosus (SLE) & Asthma), general T-cell signalling pathways, IL12
signalling mediated by STAT4, CD4 T cell receptor signalling, T cell co-stimulatory signalling
and cell adhesion molecules (CAMs), as well as the expected antigen processing and
presentation pathways, consistent with other analyses. Subsequent analysis of the
expanded gene list showed a total of 534 pathways (Supplementary File 11) (101 p < 0.05),
of which 51 remained significant after Benjamini Hochberg correction (p < 0.05) which
were mostly (53%) involved in signalling events/pathways (Table 25, Supplementary Table
17). Additionally there was a degree of overlap with the pathways identified from the
confirmed gene list. InnateDB failed to identify 17% of genes and did not have annotations
for approximately 60% of genes (Table 23).

79
Table 23 InnateDB pathway over-representation analysis summary.
Gene List Unrecognised Genes With Genes With Identified Pathways Significantly Associated Significantly Associated Pathways
Genes No Pathway Pathway (Unique) Pathways (p < 0.05) After Correction(pcorr < 0.05)
Confirmed 41 (17.45%) 138 (58.72%) 97 (41.28%) 765 (310) 72 27
Confirmed no HLA 32 (16.16%) 121 (61.11%) 77 (38.89%) 501 (233) 62 16
Expanded 70 (16.06%) 266 (61.01%) 170 (38.99%) 1390 (534) 101 51
Expanded no HLA 61 (15.29%) 249 (62.41%) 150 (37.59%) 1126 (462) 98 38
-2
Table 24 Significant pathways identified by the InnateDB pathway analysis of the confirmed gene list. Pathways are shown which attain a p-value < 1 x 10 after correction for multiple
testing. All significant (p < 0.05) pathways after correction for multiple testing are shown in Supplementary Table 15. Full results are available in Supplementary File 9.
Pathway Name Pathway Id Source Name Pathway uploaded Genes in InnateDB Genes Pathway Pathway p-value
gene count for this entity Ratio p-value (corrected)
Intestinal immune network for IgA production 8118 KEGG 11 45 24% 2.01E-14 6.23E-12
Allograft rejection 2793 KEGG 10 34 29% 4.26E-14 6.61E-12
Autoimmune thyroid disease 2799 KEGG 11 49 22% 5.64E-14 5.83E-12
Graft-versus-host disease 2807 KEGG 9 35 26% 3.44E-12 2.67E-10
Type I diabetes mellitus 525 KEGG 9 40 23% 1.29E-11 8.02E-10
Asthma 2818 KEGG 8 27 30% 1.68E-11 8.65E-10
Antigen processing and presentation 4144 PID BIOCARTA 6 12 50% 1.57E-10 6.94E-09
80

Cell adhesion molecules (CAMs) 440 KEGG 12 128 9% 1.86E-10 7.19E-09


Viral myocarditis 8123 KEGG 9 67 13% 1.72E-09 5.93E-08
Systemic lupus erythematosus 2805 KEGG 10 95 11% 2.29E-09 7.11E-08
Antigen processing and presentation 493 KEGG 8 74 11% 8.57E-08 2.42E-06
The co-stimulatory signal during t-cell activation 4018 PID BIOCARTA 5 18 28% 1.95E-07 5.05E-06
IL12 signaling mediated by STAT4 8013 PID NCI 5 29 17% 2.54E-06 6.05E-05
CD4 T cell receptor signaling (Vav, Rac and JNK cascade) 276 INOH 5 45 11% 2.37E-05 0.000524988
T cell receptor signaling (IKK-NF-kappaB cascade) 10 INOH 5 54 9% 5.82E-05 0.001202152
Primary immunodeficiency 2815 KEGG 4 35 11% 0.00014786 0.002864721
T cell receptor signaling (ERK cascade) 362 INOH 4 37 11% 0.00018434 0.003361562
T cell receptor signaling pathway 354 INOH 5 83 6% 0.00044944 0.007740429
Hematopoietic cell lineage 415 KEGG 5 85 6% 0.00050176 0.008186641
-2
Table 25 Significant pathways identified by the InnateDB pathway analysis of the expanded gene list. Pathways are shown which attain a p-value < 1 x 10 after correction for multiple
testing. All significant (p < 0.05) pathways after correction for multiple testing are shown in Supplementary Table 17. Full results are available in Supplementary File 11.
Pathway Name Pathway Source Name Pathway uploaded Genes in InnateDB Genes Pathway Pathway p-value
Id gene count for this entity Ratio p-value (corrected)
Intestinal immune network for IgA production 8118 KEGG 11 45 24% 1.58E-11 8.41E-09
Allograft rejection 2793 KEGG 10 34 29% 1.81E-11 4.83E-09
Autoimmune thyroid disease 2799 KEGG 11 49 22% 4.32E-11 7.69E-09
Cell adhesion molecules (CAMs) 440 KEGG 15 128 12% 1.92E-10 2.56E-08
Graft-versus-host disease 2807 KEGG 9 35 26% 7.47E-10 7.98E-08
Asthma 2818 KEGG 8 27 30% 1.98E-09 1.76E-07
The co-stimulatory signal during t-cell activation 4018 PID BIOCARTA 7 18 39% 2.39E-09 1.82E-07
Type I diabetes mellitus 525 KEGG 9 40 23% 2.73E-09 1.82E-07
Antigen processing and presentation 4144 PID BIOCARTA 6 12 50% 5.58E-09 3.31E-07
Primary immunodeficiency 2815 KEGG 8 35 23% 1.91E-08 1.02E-06
Viral myocarditis 8123 KEGG 10 67 15% 2.34E-08 1.14E-06
T cell receptor signaling (IKK-NF-kappaB cascade) 10 INOH 9 54 17% 4.52E-08 2.01E-06
CD4 T cell receptor signaling (Vav, Rac and JNK cascade) 276 INOH 8 45 18% 1.56E-07 6.40E-06
Systemic lupus erythematosus 2805 KEGG 10 95 11% 6.90E-07 2.63E-05
T cell receptor signaling pathway 354 INOH 9 83 11% 2.00E-06 7.11E-05
IL12 signaling mediated by STAT4 8013 PID NCI 6 29 21% 2.36E-06 7.89E-05
81

Signaling in Immune system 1959 REACTOME 15 275 5% 5.47E-06 0.0001718


Antigen processing and presentation 493 KEGG 8 74 11% 7.81E-06 0.0002317
T cell receptor signaling (ERK cascade) 362 INOH 6 37 16% 1.06E-05 0.000297
Lck and fyn tyrosine kinases in initiation of tcr activation 4116 PID BIOCARTA 4 11 36% 1.11E-05 0.0002951
IL12-mediated signaling events 8015 PID NCI 7 59 12% 1.60E-05 0.0004068
TCR signaling in naïve CD4+ T cells 8033 PID NCI 7 63 11% 2.48E-05 0.0006017
IL2 signaling events mediated by STAT5 8016 PID NCI 5 28 18% 3.69E-05 0.000822
Role of mef2d in t-cell apoptosis 4085 PID BIOCARTA 5 28 18% 3.69E-05 0.000822
TCR signaling in naïve CD8+ T cells 8004 PID NCI 6 52 12% 7.81E-05 0.0016689
T cell receptor signaling pathway 563 KEGG 8 108 7% 0.0001234 0.0025346
Hematopoietic cell lineage 415 KEGG 7 85 8% 0.0001721 0.0034029
Downstream signaling in naïve CD8+ T cells 8044 PID NCI 6 61 10% 0.0001925 0.0036719
Cd40l signaling pathway 4093 PID BIOCARTA 3 9 33% 0.0002101 0.0038694
Costimulation by the CD28 family 5974 REACTOME 5 41 12% 0.0002439 0.0043419
Jak-STAT signaling pathway 568 KEGG 9 153 6% 0.00027 0.0046507
Calcineurin-regulated NFAT-dependent transcription in lymphocytes 7969 PID NCI 5 44 11% 0.000342 0.0057064
IL27-mediated signaling events 7977 PID NCI 4 26 15% 0.0004261 0.0068943
Transcription factor binding site (TFBS) over-representation analysis showed no significant
results, after Benjamini Hochberg correction, for any of the four gene lists. Table 26 shows
a summary of the analyses and complete TFBS over-representation results are shown in
Supplementary Files 5-8.

Table 26 InnateDB TFBS over-representation analysis summary


Gene List Genes With TFBS Identified TFBS Significantly Associated
(Unique) TFBS (p < 0.05)
Confirmed 93 (39.57%) 1596 (193) 17
Confirmed no HLA 81 (40.91%) 1424 (191) 13
Expanded 174 (39.91%) 3190 (196) 23
Expanded no HLA 162 (40.60%) 3018 (196) 28

4.2.1.4 PANTHER
The PANTHER database was only able to map between 40% and 46% of Ensembl Gene IDs
from the four gene lists (Table 27). The results of the cellular component analysis showed
four significant results (p < 0.05) across all four gene lists (excluding the category
‘unclassified’): 2 from the confirmed gene list and 2 from the expanded gene list (Table 28,
Figure 27 & Supplementary Table 19). Two cellular component categories (MHC protein
complex & protein complex) accounted for all of the associations seen and became
insignificant after exclusion of MHC region genes (Supplementary Table 19).

Table 27 Summary of IDs mapped by PANTHER.


H. sapiens Reference Confirmed Confirmed no HLA Expanded Expanded no HLA
Mapped IDs 19911 94 (40.00%) 83 (41.92%) 195 (44.72%) 184 (46.12%)
Unmapped IDs 0 141 (60.00%) 115 (58.08%) 241 (55.28%) 215 (53.88%)

Table 28 PANTHER cellular component analysis summary. Full results are shown in Supplementary Table 19.
H. sapiens Confirmed Expanded
Cellular Component
Reference O/E (direction) P-value O/E (direction) P-value
MHC protein complex 58 6/0.27 (+) 3.99E-07 6/0.57 (+) 2.70E-05
protein complex 197 6/0.93 (+) 3.63E-04 6/1.93 (+) 1.37E-02

82
Chart Title
5
4.5
Chart Title H. Sapiens Reference
4 Chart Title
Confirmed
53.5
Confirmed (p <= 0.05)
4.5 35
Confirmed
H. Sapiens no HLA
Reference
44.5
2.5 7.00% Confirmed
H. Sapiens no
Confirmed HLA (p <= 0.05)
Reference
3.5 4 Expanded
2 Confirmed (p <= 0.05)
33.5 Expanded
1.5 Confirmed(p
Confirmed no<=
(p 0.05)
HLA
<= 0.05)
2.5 3 6.00% Confirmed
Expanded noHLA
Confirmedno
no HLA (p <= 0.05)
HLA
1
22.5 Expanded
Expanded
Confirmedno
noHLA
HLA(p(p<=
<=0.05)
0.05)
0.5 Expanded (p <= 0.05)
1.5 2 Expanded
0 5.00% Expanded no HLA
11.5 Expanded (p <= 0.05)
Expanded no HLA (p <= 0.05)
0.5 Expanded no HLA
1
0 Expanded no HLA (p <= 0.05)
0.5 4.00%
0

3.00%

2.00%

1.00%

0.00%
MHC protein complex protein complex

Figure 27 PANTHER cellular component analysis graph. For clarity only cellular components significant (p ≤
0.05) in one or more gene lists are shown. A lighter colour indicates that the cellular component is significant
for that gene list. Full results after exclusion of MHC region genes are shown in Supplementary Figure 5.

The results of protein class analysis for the confirmed gene list showed ten significantly (p ≤
0.05) associated categories (Table 29, Figure 28 & Supplementary Table 20), of which, the
most highly associated class was the major histocompatibility complex antigen (p = 5.66 x
10-9). Fourteen protein classes were identified using the expanded gene list. The two most
significant protein classes from the confirmed list (MHC antigen and defense/immunity
protein) remained the most significant; however, there was a reduction in the strength of
the associations. For both the expanded gene lists, the unclassified category was found to
be significantly under-represented (p = 0.0255 & 0.0133) and corresponds to approximately
25% of the successfully mapped genes in both cases (Supplementary Table 20).

Table 29 PANTHER protein class analysis summary. Full results are shown in Supplementary Table 20.
H. Sapiens Confirmed Expanded
PANTHER Protein Class
Reference O/E (direction) P-value O/E (direction) P-value
ATP-binding cassette (ABC) transporter 74 2/0.35 (+) 4.82E-02 2/0.72 (+) 1.64E-01
carbohydrate phosphatase 9 1/0.04 (+) 4.16E-02 1/0.09 (+) 8.44E-02
cytokine 159 5/0.75 (+) 9.89E-04 5/1.56 (+) 2.09E-02
defense/immunity protein 107 6/0.51 (+) 1.31E-05 6/1.05 (+) 7.17E-04
endodeoxyribonuclease 20 1/0.09 (+) 9.01E-02 3/0.2 (+) 1.07E-03
kinase 679 6/3.21 (+) 1.02E-01 13/6.65 (+) 1.70E-02
kinase activator 45 0/0.21 (-) 8.08E-01 3/0.44 (+) 1.02E-02
KRAB box transcription factor 552 0/2.61 (-) 7.12E-02 1/5.41 (-) 2.73E-02
ligase 260 2/1.23 (+) 3.48E-01 6/2.55 (+) 4.41E-02
major histocompatibility complex antigen 28 6/0.13 (+) 5.66E-09 6/0.27 (+) 4.35E-07
microtubule binding motor protein 46 2/0.22 (+) 2.03E-02 2/0.45 (+) 7.54E-02
protein kinase 510 3/2.41 (+) 4.34E-01 9/4.99 (+) 6.52E-02
receptor 1076 8/5.08 (+) 1.36E-01 19/10.54 (+) 9.82E-03
signaling molecule 961 10/4.54 (+) 1.54E-02 18/9.41 (+) 6.73E-03
transferase 1512 8/7.14 (+) 4.23E-01 25/14.81 (+) 7.21E-03
transketolase 5 1/0.02 (+) 2.33E-02 1/0.05 (+) 4.78E-02
type I cytokine receptor 31 1/0.15 (+) 1.36E-01 2/0.3 (+) 3.76E-02
Unclassified 6763 27/31.93 (-) 1.68E-01 53/66.23 (-) 2.55E-02
viral coat protein 7 1/0.03 (+) 3.25E-02 1/0.07 (+) 6.63E-02
viral protein 7 1/0.03 (+) 3.25E-02 1/0.07 (+) 6.63E-02

83
Chart Title
5 Chart Title
4.5
H. Sapiens Reference
45 14.00%
Confirmed
4.5
3.5
Confirmed
H. (p <= 0.05)
Sapiens Reference
34 12.00%
Confirmed no HLA
3.5
2.5 no<=
HLA (p <= 0.05)
Confirmed (p 0.05)
10.00%
23 Expanded no HLA
Confirmed
2.5
1.5 Expanded (p
Confirmed no<=HLA
0.05)
(p <= 0.05)
8.00% Expanded no HLA
12
no<=
Expanded (p HLA (p <= 0.05)
0.05)
1.5
0.5
6.00% Expanded no HLA
01
Expanded no HLA (p <= 0.05)
0.5
4.00%
0

2.00%

0.00%

Figure 28 PANTHER protein class analysis graph. For clarity only protein classes significant (p ≤ 0.05) in one or
more gene lists are shown. A lighter colour indicates that the protein class is significant for that gene list. Full
results after exclusion of MHC region genes are shown in Supplementary Figure 6.

Four pathways were found to be significantly over-represented using the confirmed gene
list (Table 30, Figure 29 & Supplementary Table 21). The most associated (p = 7.32 x 10-3) of
these is involved in interleukin signalling and remains significant across all four gene lists,
along with the Toll receptor signalling pathway. Analysis of the expanded gene list resulted
in six significantly associated pathways. Three of these were shared with the confirmed
gene list analysis with the most associated pathway, T cell activation, increasing in
significance from the confirmed list (p = 0.0127) to 7.64 x 10-5. Additionally, one pathway
involved in inflammation mediated by chemokine and cytokine signalling attained
significance following exclusion of the MHC region genes (p = 0.0489) from the expanded
gene list (Supplementary Table 21). In addition to the pathways mentioned above, an
unclassified pathway was also significantly under-represented across all four gene lists,
representing approximately 80% of the successfully mapped genes.

Table 30 PANTHER pathway analysis summary. Full results are shown in Supplementary Table 21.
H. sapiens Confirmed Expanded
Pathway
Reference O/E (direction) P-value O/E (direction) P-value
Angiogenesis 191 2/0.9 (+) 2.28E-01 6/1.87 (+) 1.19E-02
B cell activation 82 1/0.39 (+) 3.22E-01 4/0.8 (+) 9.02E-03
Inflammation mediated by chemokine and 283 3/1.34 (+) 1.50E-01 6/2.77 (+) 6.13E-02
cytokine signaling pathway
Interleukin signaling pathway 161 4/0.76 (+) 7.32E-03 7/1.58 (+) 1.15E-03
JAK/STAT signaling pathway 20 1/0.09 (+) 9.01E-02 2/0.2 (+) 1.68E-02
Notch signaling pathway 47 2/0.22 (+) 2.11E-02 2/0.46 (+) 7.82E-02
T cell activation 102 3/0.48 (+) 1.27E-02 7/1 (+) 7.64E-05
Toll receptor signaling pathway 62 2/0.29 (+) 3.51E-02 3/0.61 (+) 2.36E-02
Unclassified 17337 75/81.85 (-) 3.09E-02 156/169.79 (-) 3.60E-03
Vasopressin synthesis 12 1/0.06 (+) 5.51E-02 1/0.12 (+) 1.11E-01

84
Chart Title
5 Chart Title
4.5
5 H. Sapiens Reference
4
4.50% Confirmed
4.5
3.5
H. Sapiens Reference
Confirmed (p <= 0.05)
4
3 4.00% Confirmed no HLA
3.5
2.5 Confirmed (p
no <= 0.05)
HLA (p <= 0.05)
3 3.50% Confirmed
Expanded no HLA
2
2.5 Confirmed no<=HLA
Expanded (p (p <= 0.05)
0.05)
1.5
3.00% Expanded no HLA
2
1
1.5 Expanded (p
no<= 0.05)
HLA (p <= 0.05)
0.5 2.50%
Expanded no HLA
1
0 Expanded no HLA (p <= 0.05)
0.5 2.00%

0
1.50%

1.00%

0.50%

0.00%
Angiogenesis B cell Inflammation Interleukin JAK/STAT Notch signaling T cell Toll receptor Vasopressin
activation mediated by signaling signaling pathway activation signaling synthesis
chemokine and pathway pathway pathway
cytokine
signaling
pathway

Figure 29 PANTHER pathway analysis graph. For clarity only pathways significant (p ≤ 0.05) in one or more
gene lists are shown. A lighter colour indicates that the pathway is significant for that gene list. Full results
after exclusion of MHC region genes are shown in Supplementary Figure 7.

4.2.1.5 Reactome
The Reactome database failed to map the largest proportion of genes across all databases
used, achieving a maximum mapped proportion of 15.79% (Table 31 & Supplementary
Table 22). The Reactome pathway analysis tool identified 62 pathways as significantly (p <
0.05) over-represented by genes in the confirmed gene list (Supplementary Table 23). The
most significant pathway (p = 1.11 x 10-4) was involved in co-stimulation by the CD28 family
and represented 5 genes (CD28, CTLA4, HLA-DRB1, HLA-DRB5 & ICOS) which remained
significant across all four gene lists despite the exclusion of the HLA-DRB1 and HLA-DRB5
genes (Table 32). The pathway which represented the most genes was involved in signalling
in the immune system and involved 12 genes (C5, CD28, CD40, CTLA4, HLA-DRB1, HLA-
DRB5, ICOS, IL2, IL2RA, IRF5, PSMB8 & TNFAIP3) from the confirmed gene list. This pathway
also remained significant across all four gene lists.

Table 31 Summary of IDs mapped by Reactome.


Confirmed Confirmed no HLA Expanded Expanded no HLA
Mapped IDs 34 (14.47%) 29 (14.65%) 68 (15.60%) 63 (15.79%)
Unmapped IDs 201 (85.53%) 169 (85.35%) 368 (84.40%) 336 (84.21%)

85
Table 32 Significant Reactome events. Full results are shown in Supplementary Table 23 – Supplementary
Table 26.
Un-adjusted probability of seeing N or more genes in this Event by
Event ID Name of this Event chance
Confirmed Confirmed no HLA Expanded Expanded no HLA
REACT_19344 Costimulation by the CD28 1.11E-04 8.12E-03 4.72E-05 2.02E-03
family
REACT_23782 SHC1 mediates cytokine-induced 6.29E-03 4.60E-03 1.64E-03 1.31E-03
phosphorylation of GAB2
REACT_23828 Phosphorylated SHC recruits 6.29E-03 4.60E-03 1.64E-03 1.31E-03
GRB2:SOS1
REACT_23837 Interleukin-3, 5 and GM-CSF 4.21E-02 3.14E-02 3.95E-03 2.99E-03
signaling
REACT_23856 Phosphorylated SHC1 recruits 6.29E-03 4.60E-03 1.64E-03 1.31E-03
GRB2:GAB2
REACT_23874 Phosphorylated SHC1 recruits 6.29E-03 4.60E-03 1.64E-03 1.31E-03
SHIP
REACT_23891 Interleukin receptor SHC 1.78E-02 1.31E-02 7.80E-03 6.30E-03
signaling
REACT_23911 The SHC1:SHIP1 complex is 6.29E-03 4.60E-03 1.64E-03 1.31E-03
stabilized by GRB2
REACT_23928 SOS1 activates H-Ras 7.04E-03 5.15E-03 1.95E-03 1.56E-03
REACT_24024 Gab2 binds the p85 subunit of 1.14E-02 8.36E-03 4.02E-03 3.23E-03
Class 1A PI3 kinases
REACT_6900 Signaling in Immune system 8.79E-04 1.00E-02 1.63E-07 2.19E-06

Analysis of the expanded gene list showed 51 significantly associated pathways


(Supplementary Table 25) of which 32 were also present in the analysis of the confirmed
gene list (Supplementary Table 23). The most significant pathway, signalling in the immune
system (p = 1.63 x 10-7), also represented the pathway with the most involved genes (26)
and attained the highest significance over all gene lists.

Overall, 86 unique pathways were identified across the four gene lists (Supplementary
Table 27), 11 of which remained significant in all gene lists (Table 32).

4.2.2 Protein-protein Interaction


Analysis of the confirmed gene list resulted in 13
direct interactions involving 19 genes (Figure 30),
11 of which were genes from the MHC region
(Table 33). By including additional genes from the
expanded gene list 36 direct interactions were
identified involving 43 genes (Figure 31). This
network contained all the interactions from the
confirmed list interaction network, along with an
additional 5 genes from the confirmed gene list
which interact with genes only present in the
Figure 30 Protein-protein interaction network
expanded gene list. For, example the CD2- produced from the confirmed gene list. Genes
in the MHC region are shown as diamonds.
CD247-CD28 interaction was not present in the
confirmed gene list analysis as CD247 only appears on the expanded gene list.

86
Table 33 Protein-protein interaction summary. For each interaction network produced, the number of interactions, genes and sub-interactions
are shown. Additionally, the number of genes from each list is shown.
Gene List Number of Number Number of Number from Number of MHC Number from
Interactions of genes sub-interactions Confirmed Gene List region genes Expanded Gene List
No sub-interactions Confirmed 13 19 19 11
Confirmed no HLA 4 7 7
Expanded 36 43 24 11 19
Expanded no HLA 27 31 12 19
<= 1 sub-interactions Confirmed 109 80 37 40 13 3
Confirmed no HLA 80 57 29 25 3
Expanded 599 299 191 51 13 57
Expanded no HLA 553 273 179 37 57

hexagons symbolise genes from the expanded gene list.


squares, genes in the MHC region are shown as diamonds and
expanded gene list. Genes from the confirmed gene list are shown as
Figure 31 Protein-protein interaction network produced from the
87
Analysis of the extended, indirect interaction networks, allowing a maximum of one joining
sub-interaction, resulted in much larger networks (Table 33). Restricting the analysis to the
confirmed RA genes and allowing one ‘non-list’ gene as a joining, sub-interaction
(Supplementary Figure 10) showed 109 interactions involving 80 genes using 37 sub-
interactions. This expanded the original confirmed gene list network to include an
additional 40 genes from the confirmed gene list, 13 of which were genes in the MHC
region. Interestingly, 3 genes from the expanded gene list were identified as sub-
interactions: CD247, PTPRC and TRAF6. Addition of the TRAF6 sub-interaction reveals a link
between two previously separate sets of interactions: the CD40-TRAF1-TNFAIP3 and IRF5-
XPO1 interactions and shows an alternative interaction route between CD40 and TNFAIP3.
With the exception of the KIF5A-NCOA2-PSMB9 interaction, the network produced was a
single network with all genes interacting with other members of the network in some way.

The extended, indirect network produced using the expanded gene list, resulted in the
largest interaction network, containing 599 interactions involving 299 genes
(Supplementary Figure 12). The majority of genes (191) were sub-interactions, although a
large number still came from the confirmed and expanded gene lists (51 and 57 genes
respectively). Three networks were identified which did not interact with the main
network: SIPR2-GNAI1-S1PR5-GNAO1-DCTN2, CCL21-UBQLN4-ERP29 and KIF5A-NCOA2-
PSMB9 (sub-interaction, confirmed, MHC, expanded), one of which was also identified in
the analysis of the confirmed gene list.

Removal of the many immune related MHC genes from the analyses reduced the number
of interactions significantly using the confirmed gene list as the input (70% decrease in the
direct analysis, 30% decrease in the indirect analysis) and reduced the findings in the
expanded gene list, although not as significantly (30% decrease in the direct analysis, 10%
decrease in the indirect analysis).

Analysis of the extended networks was difficult due to their complexity; however, it was
possible to identify ‘hubs’ of interaction by ranking each gene by the number of interacting
genes (Table 34 – Table 36 & Supplementary Table 28 – Supplementary Table 31). The
confirmed gene network showed 31 genes with 3 or more interactions but only 1 gene
(CD28) with 10 or more (Table 34). After exclusion of the MHC region genes, 21 genes
remained with 3 or more interactions and CD28 remained the only gene with 10 or more
interactions (Table 35). The CD28 hub was extracted from the full gene network, along with
its direct and in-direct (via sub-interactions) interacting genes (Figure 32). In addition to the

88
2 direct interacting genes from the expanded gene list (CD247 & PTPRC), 9 genes from the
confirmed list have been identified as interacting with CD28 through sub-interactions which
were not identified in the initial confirmed gene list network (Figure 32). Conducting the
same analysis to the expanded network produced using the expanded gene list results in
137 genes with 3 or more interactions, although approximately half (54.7%) represent
genes not present in either gene list (Table 34). Only 2 (9.5%) of the 21 genes (Table 36)
with 10 or more interactions were not from either gene list and all the 6 genes with 20 or
more interactions were from either the confirmed or expanded gene lists. The gene with
the highest number of interactions is PTPN11, interacting with a total of 37 genes: 17 genes
from the confirmed list and 20 genes from the expanded gene list. Two genes, ERBB2 &
CTLA4, directly interact with PTPN11 and the remainder interact through 48 sub-interaction
genes (Figure 33). The hub with the highest number of direct interactions was CD247
which, out of a total of 22 interactions, interacted with 3 confirmed list genes and 2 genes
from the expanded list (Supplementary Table 30).

Table 34 Extended Protein-protein interaction summary. The number of interactions is shown for each gene
list. The numbers in brackets involve genes not present in the gene list. A cut-off of >= 3 interactons was used
to exclude interactions involving genes not present in the gene list as all such interactions will have at least
two interactions.
Gene List ≥ 3 Interactions ≥ 10 Interactions ≥ 20 Interactions
Confirmed 31 (8) 1 (0) 0 (0)
Confirmed no HLA 21 (7) 1 (0) 0 (0)
Expanded 137 (75) 21 (2) 6 (0)
Expanded no HLA 119 (67) 21 (2) 6 (0)

Table 35 Hub analysis of the extended protein-protein interaction network produced from the confirmed
gene list. Genes with >= 10 interactions are shown. Interactions are also split by the type of interaction (e.g.
direct interaction with confirmed gene).
Gene Gene Type Interactions Direct Interaction Direct Interaction Direct Interaction With
With Confirmed With Expanded Interaction Sub-Interaction
Gene Gene With HLA Gene Gene
CD28 Confirmed 10 0 2 0 8

Table 36 Hub analysis of the extended protein-protein interaction network produced from the expanded
gene list. Genes with >= 10 interactions are shown. Interactions are also split by the type of interaction (e.g.
direct interaction with confirmed gene).
Gene Gene Type Interactions Direct Interaction Direct Interaction Direct Interaction With
With Confirmed With Expanded Interaction Sub-Interaction
Gene Gene With HLA Gene Gene
TRAF1 Confirmed 18 2 0 0 16
CD28 Confirmed 13 0 2 0 11
CD40 Confirmed 12 1 1 0 10
REL Confirmed 12 0 0 0 12
CTLA4 Confirmed 10 0 1 0 9
PTPN11 Expanded 50 1 1 0 48
TRAF6 Expanded 33 3 1 0 29
PTPRC Expanded 28 2 2 0 24
CD247 Expanded 22 3 2 0 17
ERBB2 Expanded 22 0 2 0 20
IL2RB Expanded 21 2 0 0 19
TYK2 Expanded 19 0 1 0 18
LAT Expanded 19 0 0 0 19
CD19 Expanded 16 0 0 0 16
PRKCQ Expanded 14 0 1 0 13

89
CDC37 Expanded 14 0 1 0 13
PTPN2 Expanded 14 0 0 0 14
GRB7 Expanded 13 0 1 0 12
POU2F1 Expanded 13 0 0 0 13
GRB2 Sub- 12 2 10 0 0
interaction
LCK Sub- 11 4 7 0 0
interaction

Figure 32 CD28 network hub from the extended protein-protein interaction network produced from the
confirmed gene list. Genes from the confirmed gene list are shown as squares, genes in the MHC region are
shown as diamonds and hexagons symbolise genes from the expanded gene list.

90
Figure 33 PTPN11 network hub from the extended protein-protein interaction network produced from the
expanded gene list. Genes from the confirmed gene list are shown as squares, genes in the MHC region are
shown as diamonds and hexagons symbolise genes from the expanded gene list. Protein-protein interactions
between other members removed for clarity.

4.2.3 Comparison with Previous Publications


Results were compared to three other studies that used similar pathway analysis
techniques and the same RA WTCC dataset. Overall 10 genes, identified as significant in
previous pathway analysis publications, were identified by the Taverna workflow, all of
which were identified in the confirmed gene list (Table 37). Seven were present in the
extended protein-protein interaction network and the remaining 31 genes identified by
previous publications were not present in either gene list or the extended interaction
network. The majority of genes (63%) identified by the previous publications are found in
the MHC region and no genes from the expanded gene list were identified.

91
Table 37 Comparison of genes implicated by the Taverna workflow with previously published findings. The
study which implicated each gene is shown where B = Baranzini et al., P = Peng et al. and L = Luo et al.
Additionally, whether the gene is in the MHC region is shown.
Study Gene Name Current Gene Name Identified In Study Identified In Gene List MHC Region Gene
BTNL2 BTNL2 PL confirmed Yes
C6orf10 C6orf10 P confirmed Yes
HLA-DQA2 HLA-DQA2 BPL confirmed Yes
HLA-DQB1 HLA-DQB1 L confirmed Yes
HLA-DRA HLA-DRA BP confirmed Yes
MAGI3 MAGI3 P confirmed No
NOTCH4 NOTCH4 P confirmed Yes
PTPN22 PTPN22 L confirmed No
RSBN1 RSBN1 P confirmed No
TAP2 TAP2 P confirmed Yes
ABL1 ABL1 B interacting No
CD4 CD4 B interacting No
GHR GHR B interacting No
GRB2 GRB2 B interacting No
MAPK1 MAPK1 B interacting No
PIK3R1 PIK3R1 B interacting No
RET RET B interacting No
AGPAT1 AGPAT1 L none Yes
AIF1 AIF1 L none Yes
APOM APOM P none Yes
BAT3 BAG6 PL none Yes
BAT4 GPANK1 PL none Yes
BBOX1 BBOX1 P none No
CBLB CBLB B none No
CFB CFB P none Yes
CREBL1 ATF6B L none Yes
EHMT2 EHMT2 L none Yes
GPSM3 GPSM3 PL none Yes
HLA-DPA1 HLA-DPA1 BPL none Yes
HLA-DPB1 HLA-DPB1 PL none Yes
HSPA1L HSPA1L P none Yes
ITPR3 ITPR3 P none No
KDR KDR B none No
LOC642038* P none
LOC731881* PL none
MICA MICA L none Yes
MMEL1 MMEL1 P none No
MSH5 MSH5 P none Yes
PRNP PRNP B none No
PRRT1 PRRT1 P none Yes
PSORS1C1 PSORS1C1 P none Yes
RDBP RDBP PL none Yes
RPS18 RPS18 L none Yes
SELL SELL B none No
STXBP6 STXBP6 P none No
TNXB TNXB P none Yes
TRIM26 TRIM26 P none Yes
TRIM40 TRIM40 P none Yes
VARS2 VARS2 P none Yes
ZFP57 ZFP57 L none Yes
* = NCBI Gene record discontinued

92
5 Discussion
5.1 Taverna Workflow
Genome Wide Association Studies (GWAS) have been successful in implicating many loci
associated with various common complex diseases (De Jager et al. 2009; Duerr et al. 2006;
Libioulle et al. 2007; Wellcome Trust Case Control Consortium 2007). GWAS are designed
to capture the largest representation of the whole genome and often the variants reported
by these studies often map to intergenic, non-coding regions and do not represent a causal
change but will represent a variation in Linkage Disequilibrium (LD) with the observed
association. Additionally, structural organisation of the genome can cause two relatively
distant regions of the genome to be in close contact with each other and therefore a
change in one of these regions could affect the other (Zhao et al. 2006).

As such it is sometimes difficult to determine the functional effect of the observed


association and to which gene it relates to. Associated SNPs are often assigned to genes
based on proximity, for example the nearest gene, or by using the ‘most biologically
plausible’ method where the gene which makes the most ‘biological sense’ in a region is
chosen (e.g. rs6920220 assigned to TNFAIP3, ~182KB away) (Figure 34). This introduces
positional or researcher bias and can result in mis-assignment of SNPs to genes directing
research to the wrong conclusions.

Figure 34 TNFAIP3-OLIG3 region showing the position of rs6920220 (red line) relative to the genes. The
TNFAIP3 gene lies approximately 182KB from rs6920220.

SNPs are located within blocks of LD and it is currently thought that the causal gene will lie
within this LD region. The International HapMap Project has given researchers the ability to
assess the extent of LD in a region of interest. This can be used to define a genomic region
representing the potential extent of the observed association which can be refined further
by including information on recombination hotspots. This method has the advantage of
offering no researcher bias whilst still using a biological basis to define the region
represented by the observed association. Indeed this method was used by Raychaudhuri et
al. (2008) to define regions for the GRAIL programme (Raychaudhuri et al. 2009a) and

93
proved very successful, implicating 22 new loci and successfully replicating 13 of them (p <
0.05).

Although the Taverna workflow identified many of the same genes as the ‘most biologically
plausible’ method (Table 11), it achieved this in an unbiased way and in most cases also
identified additional genes which could be affected by the observed association.
Additionally, some of the genes identified by the workflow represent biologically plausible
candidates which were overlooked possibly due to their proximity to the associated SNP.
For example, rs3184504, from the expanded SNP list, was originally assigned to the SH2B3
gene which encodes the SH2B adaptor protein 3 whose function is to link the T-cell
receptor activation signal to phospholipase C-gamma-1, GRB2 and phosphatidylinositol 3-
kinase (Source: UniProtKB/Swiss-Prot Q9UQQ2) which offers a good biological candidate.
Figure 35 indicates the approximate SNP location in the context of the region defined by
the Taverna workflow, showing the SNP to lie within exon 3 of the SH2B3 gene.
Additionally, the SNP is a non-synonymous SNP changing the amino acid Tryptophan (W) to
Arginine (R) at position 262 and therefore provides good evidence of being the functional
variation. However, the change is predicted to be benign by PolyPhen-2 with a score of 0
and as such the observed association may still represent a SNP in LD. Additional candidates
identified by the workflow include TRAFD1, containing a TRAF-like domain and PTPN11
which contains a tyrosine-protein phosphatase domain shared with PTPN22 shown to be
functionally associated with RA (Begovich et al. 2004).

Figure 35 The associated region for rs3184504 defined by the Taverna workflow. Its approximate location
(red line) is shown relative to genes in the region.

The Taverna workflow did not identify any of the candidate genes for three out of the four
SNPs associated with the 6q23 region containing the genes TNFAIP3 and OLIG3. The one
SNP which the workflow did select the candidate gene, rs5029937, maps to intron 5 of the
TNFAIP3 gene and was therefore easily identified. The remaining three (rs6920220,
rs13207033 & rs1167223) lie in a 54KB haplotype block 144KB telomeric of OLIG3 and
175KB centromeric of TNFAIP3 and one (rs6920220) is in perfect LD with a SNP (rs6927172)
which has evidence of repressor activity on TNFAIP3 transcription and differential
transcription factor binding (Elsby et al. 2010) and could therefore be a candidate for long

94
range regulation which the workflow is unable to detect as the effect often extends beyond
LD.

Other candidates for possible regulatory roles or structural organisation analyses are
rs548234 and rs2793108. Originally assigned to PRDM1, rs548234 maps to a region
approximately 1KB from PRDM1 and could act as a regulatory element (Pomerantz et al.
2009). The second candidate, rs2793108, lies in a region approximately 127KB telomeric of
the ZEB1 gene and further work would be required to test its effect.

The Taverna workflow did not identify the most biologically plausible gene for the most
highly associated SNP, rs6910071, which, according to the literature, is a tag for the HLA-
DRB1*0401 allele. This is most probably due to a limitation of the workflow which relies on
LD data as the HLA-DRB1 gene lies in the MHC region on chromosome 6. Due to the highly
polymorphic nature, close physical linkage and en bloc inheritance seen in the MHC region
(Rodey 2000), defining genes in this manner may lead to incorrect assignment due to
problems associated with the LD data and recombination hotspots.

A similar limitation of the workflow has been identified by two SNPs: rs548234 and
rs3890745. Due to the multiple databases used and the differences in genome assembles,
the workflow attempts to convert co-ordinates from one assembly to another using a
simple two point conversion (i.e. start and end) and does not attempt to map complete
regions across genome assemblies as this could often fail. Even using a two point
conversion method, the workflow was unable to map the ‘LDRegion’ to either the NCBI34
or GRCh37 genome assemblies and therefore could not continue to extend this region to
recombination hotspots or obtain genes positioned in that region. Again this limitation is
due to the available data and cannot be solved manually.

5.1.1 Workflow Limitations


Due to the way the workflow has been designed, it has hopefully addressed many of the
potential limitations, such as genome assembly conversions (where possible), local
dependencies and incorrect or replaced SNP IDs. However, the main limitation of the
workflow is imposed by the availability of the online resources used, as failure of any of
these would cause the workflow to fail. The resources used are large, fairly robust web
services and breaks in service availability are rare and are beyond the control of the
workflow.

95
Other limitations of the workflow are a result of the limitations of the HapMap LD data.
Firstly, if the supplied SNP is not present in the HapMap data, the workflow is unable to
assign an LD region beyond that of the supplied SNP. In this scenario, the workflow
attempts to continue to define a region based on recombination hotspots but this region is
likely to be inaccurate. A similar problem occurs if no SNP within the LD cutoff has been
genotyped, upstream and/or downstream, resulting in a similar effect to the previous
limitation where the workflow continues to define a region based on recombination
hotspots. The final limitation is due to the potential inaccuracy of the HapMap data and is a
result of the number of SNPs in a region which have been genotyped. As this number
increases the measure of LD between any two loci in the region becomes more and more
refined. If the supplied SNP lies in a region of poor coverage, with regards to the number of
SNPs which have been genotyped, the measure of LD between this SNP and others in the
region may be inaccurate and therefore the region defined by the workflow will also be
inaccurate. The effect is particularly important/apparent in the MHC region which shows a
high level of linkage which covers large regions.

The method which the workflow uses to assign genes (Figure 11) also has its own
limitations. As previously mentioned, this method uses the assumption that the causal
gene will lie in the SNPs LD block, determined by the workflow, but it could lie in an
adjacent or completely different region of the genome and the effect seen is due to long
range regulation. For example, this is the case for the 6q23 region, where the workflow
failed to identify the candidate gene because it lies outside the SNPs LD block even though
there is evidence to suggest it plays a role in TNFAIP3 regulation. This effect could be
further confounded if a biologically plausible gene was found in the region defined by the
workflow and as such, subsequent efforts to validate this locus would be futile.

One of the main advantages of the workflow is that it is able to assign genes using an
unbiased approach unlike the ‘biologically plausible’ method. This however could also be a
limiting feature in the success of the workflow, as it could identify multiple genes in the
same region which could all be implicated with the disease but actually represent just one
of these loci. For example, the workflow analysis of rs231735, originally assigned to CTLA4,
has implicated, among others, the CD28, ICOS and CTLA4 genes. All these genes prove
compelling biological candidates; however, it is unlikely that the association observed
relates to all three loci and would therefore require further work to find the causal one.

96
5.1.2 Workflow Summary
Overall the workflow performs well and is written to allow modification of certain
parameters by the user for added flexibility. The use of the XML format throughout the
workflow provides a defined structure for the results whilst still providing a format which
can be easily read by machines as well as humans. The creation of utility workflows gives
researchers the ability to use the workflow without having to manually produce error free
XML files.

5.2 Disease Analysis – RA


5.2.1 Pathway Analysis
Rheumatoid arthritis (RA) is a chronic autoimmune disease characterised by inflammation
and destruction of synovial joints leading to disability (Elsby et al. 2010). Inflammation is
caused by infiltration of inflammatory cytokines, B and T cells, primarily CD4+ T cells, into
the synovial membrane (Choy & Panayi 2001; Strand, Kimberly, & Isaacs 2007) and the role
of T cells have been implicated due to the association of HLA-DRB1 and PTPN22, a
lymphoid specific kinase (McInnes & Schett 2007). HLA-DRB1 present antigens to CD4+ T
cells which in turn stimulate monocytes, macrophages and synovial fibroblasts to produce
the proinflammatory cytokines IL-1, IL-6 and TNF-α which drive inflammation and can be
detected in patients serum and synovial fluid (Choy & Panayi 2001).

It is clear from the GWAS results (Table 9) that HLA-DRB1 plays a large role in RA and is the
most robust association with the strongest effect. It is therefore not surprising that the
MHC Class II protein complex and antigen processing and presentation associated terms
are among the most significantly associated across most of the pathway databases.
However, this effect is largely due to the representation of MHC Class II genes identified by
the Taverna workflow. The genes identified by the workflow for the HLA-DRB1 associated
SNPs include most of the MHC Class II genes including the DR, DQ, DM & DO molecules and
the antigen peptide transporters, TAP1 and TAP2 (Table 11). This is due to the limitation of
the LD data as discussed above and once MHC region genes are excluded the effect
disappears. This suggests that although the effect of the MHC region genes is important in
either the establishment and/or progression of disease, it is a distinct pathway unit and not
involved directly in other significant pathways.

T cells are composed of two main classes which can be distinguished by their cell surface
proteins which are important in defining which other cells they are able to interact with.
One class, CD8 T cells, develop into cytotoxic T cells acting against cells infected by viruses.

97
The other class express the cell surface molecule CD4 and have the ability to differentiate
into multiple types of effector T cells (Travers, Walport, & Murphy 2008). As mentioned
previously, the majority of T cells seen in RA are the CD4+ T cells. The NOTCH signalling
pathway is essential in early T cell differentiation and has been proposed to direct CD4/CD8
lineage commitment in later stages (Laky & Fowlkes 2008). Both the BiNGO and PANTHER
analyses identified the NOTCH signalling pathway as being over-represented in the
confirmed gene list (Table 17 & Table 30) supporting the role of NOTCH in the pathogenesis
of RA.

To elicit an inflammatory response, CD4+ T cells must first be primed by antigen presenting
cells (APCs) and require a co-stimulatory signal, such as CD28, to ensure cell survival
(Travers, Walport, & Murphy 2008). Once primed, the T cell expresses a number of proteins
such as ICOS, CTLA-4 and IL-2 involved in regulation of proliferation (Figure 36). The DAVID
analysis of the confirmed gene list identified an enriched cluster (2.495) involved in T cell
co-stimulation (Table 21) and the T cell co-stimulatory pathway was significantly associated
in both the InnateDB (Supplementary Table 23, Table 25, Supplementary Table 16 &
Supplementary Table 18) and Reactome (Table 32) pathway analyses across all groups,
highlighting the importance of these genes in the pathogenesis of RA. Additionally a T cell
activation pathway was found to be significantly associated in three of the four gene lists in
the PANTHER analysis (Supplementary Table 21 & Supplementary Figure 7). However,
CD28, CTLA4 and ICOS are located in close proximity on chromosome 2 and although there
is evidence of three associations, these may not be independent and could represent the
same causal gene. Further validation would be required to ensure these associations do
represent three separate loci and can contribute to the same pathway.

Following 4-5 days of proliferation, T cells differentiate into effector T cells which change
the expression of various cell adhesion molecules (CAMs), such as CD2 which increases the
T cells potential to interact with potential target cells (Travers, Walport, & Murphy 2008).
The CAM pathway was found to be significantly associated in the InnateDB analysis across
all four groups (Table 24, Table 25, Supplementary Table 16 & Supplementary Table 18),
possibly highlighting other potential CAMs which could be involved in RA, such as ITGB2
which encodes the LFA-1 protein responsible for the adhesion of T cells to their targets.

98
Figure 36 T cell priming pathway. Upon activation of the T cell by the MHC class II receptor-peptide complex
and CD28-B7 co-stimulatory signal, the expression of the IL2 gene and CTLA-4 is up regulated. IL-2 leads to T
cell proliferation followed by differentiation into effector T cells. CTLA-4 has much higher affinity for B7 and
regulates the proliferative response of IL-2.

Naïve CD4+ T cells have the ability to differentiate into a number of different effector T
cells in response to specific cytokines, each specialised to engage with different classes of
pathogens. The predominant effector T cells present in RA joints are Th1 T cells (Feldmann,
Brennan, & Maini 1996) whose differentiation is controlled early in the immune response
and stimulated by the cytokines IFN-γ and IL-12. The BiNGO analysis of the expanded gene
lists (Table 18 & Supplementary Table 5) identified an over-representation of genes
involved in the terms ‘regulation of interleukin-12 production’, ‘positive regulation of
interleukin-12 biosynthetic process’ and ‘regulation of interleukin-12 biosynthetic process’
suggesting that IL-12 production may be perturbed in individuals with RA leading to an
increase in Th1 T cells. Additionally, the IL-12 signalling mediated by STAT4 pathway was
significantly associated across all four gene lists in the InnateDB analysis (Table 24, Table

99
25, Supplementary Table 16 & Supplementary Table 18) implicating STAT4 as a potential
influential gene in regulation of IL-12 signalling in RA.

Naïve CD4+ T cells are also characterised by their production of specific cytokines. These
cytokines stimulate the JAK-STAT signalling pathway to activate specific genes which was
found to be significantly over-represented in the InnateDB analysis (Table 25,
Supplementary Table 16 & Supplementary Table 18) in all but the confirmed gene list
analysis. The JAK-STAT signalling pathway was also identified by the PANTHER analysis but
was only significant for both expanded gene lists (Supplementary Table 21 &
Supplementary Figure 7). The JAK-STAT signalling pathway is also important in the
differentiation of T cells to Th1 T cells and establishes STAT genes as influential genes in RA
pathogenesis. Many immune system signalling events are identified by the various pathway
analysis methods and are clearly important in the immune response and RA pathogenesis.
Cell signalling events are often transmitted by protein kinases to activate various cellular
processes including transcription. The DAVID functional classification analysis of both
expanded gene lists shows an enriched kinase cluster including a kinase (MAPKAPK5)
activated by proinflammatory cytokines via a mitogen-activated protein (MAP) kinase and a
B lymphoid tyrosine kinase (BLK) involved in B-cell receptor signalling and B-cell
development (Table 22 & Supplementary Table 11).

Th1 T cells are capable of activating macrophages and producing TNF-α leading to chronic
inflammation, the main characteristic of RA. Despite the obvious role of inflammation in
the pathogenesis of RA, only one pathway database, PANTHER (Supplementary Table 21 &
Supplementary Figure 7), reported inflammation as an enriched term and only for the
expanded gene list after MHC region genes were excluded at a relatively low significance (p
= 0.0489). This indicates an absence of the proinflammatory cytokines, including TNF-α and
associated gene products from the gene lists. TNF-α has been shown to be an important
regulator of IL-1 production (Brennan et al. 1989) and anti-TNF therapies have been shown
to be successful in treating RA (Lipsky et al. 2000). TNF-α is encoded by the TNF gene
located on chromosome 6p21 in the extended MHC region. Due to the complexity of the
MHC region and multiple association signals observed (Figure 37), the exact causal genes in
this region are currently poorly understood. Other genes, such as TNF, in the region have
been shown to be associated independently with the disease but it is unclear if this is the
case. However, there is evidence for macrophage involvement in RA. Macrophage
activation requires a co-stimulatory signal by Th1 cells expressing the CD40 ligand which

100
activates the CD40 receptor (van & Banchereau 2000). This pathway was significantly
associated across all gene groups in the InnateDB pathway analysis (Supplementary Table
15, Supplementary Table 16, Supplementary Table 17 & Supplementary Table 18)
supporting the role of macrophages in RA pathogenesis.

Figure 37 WTCCC GWAS results (Wellcome Trust Case Control Consortium 2007) for RA showing the multiple
association signals in the MHC region (chromosome 6 green).

Activated T cells can also stimulate B cells, stimulate osteoclastogenesis and angiogenesis
and promote recruitment of inflammatory cells (Choy & Panayi 2001). It is therefore
expected that the regulation of immune cells, specifically lymphocytes, will play an
important role in the pathogenesis of RA as shown by the BiNGO analysis (Table 17, Table
18 & Supplementary Table 3 – Supplementary Table 6). Interestingly the regulation of
activation, differentiation and proliferation of lymphocytes are all significantly associated,
highlighting their roles in the establishment, progression and maintenance of the immune
response which are confirmed further by the addition of genes assigned to nominally
associated SNPs. Addition of these genes also caused the angiogenesis and B cell activation
pathways to be identified by the PANTHER analysis (Table 30), providing further evidence
for their involvement in the pathogenesis of RA.

Antibodies are an important clinical feature of RA and importantly anti-cyclic citrullinated


peptide (CCP) antibodies have been shown to predict the development and poor outcome
in early RA (Agrawal, Misra, & Aggarwal 2006). There are five different immunoglobulin
classes, IgA, IgD, IgE, IgG and IgM, each capable of different effector functions. The first to
be produced in an immune response are IgM as these do not require class switching and
are mainly found in the blood. Immunoglobulin G (IgG) antibodies are involved in the
secondary immune response and are the most abundant (>75%) form of immunoglobulin
acting mainly in tissues (Travers, Walport, & Murphy 2008). They have the largest effect on
complement activation, opsonisation and neutralisation of all the immunoglobulin classes,
have much higher affinity than IgM antibodies and represent an established immune
response. Interestingly, the GO term ‘positive regulation of isotype switching to IgG
isotypes’ was significantly associated across all four gene lists assisting the maintenance of
the disease (Table 17, Table 18, Supplementary Table 3 – Supplementary Table 6).

101
Arginine vasopressin (AVP) is a neurohypophyseal hormone which primarily effects water
reabsorption in the tubules of the kidneys (Petersson et al. 2006) although it has also be
shown to have proinflammatory effects (Chikanza & Grossman 1998), increase antibody
responses (Croiset, Heijnen, & de Wied 1990) and positively regulate essential immune
functions via IFN-γ (Torres & Johnson 1988). The vasopressin synthesis pathway was
significantly associated in the PANTHER analysis of the confirmed gene list excluding MHC
region genes (Supplementary Table 21 & Supplementary Figure 7), identifying one gene,
peptidylglycine alpha-amidating monooxygenase (PAM), from the list. The PAM gene
encodes an enzyme which catalyses neuroendocrine peptides to the active alpha-amidated
form (NCBI Gene ID: 5066), in this case Pro2-Vasopressin to AVP. This finding potentially
identifies a pathogenic mechanism, implicating a new locus for the rs26232 association,
originally assigned to the GIN1 gene, in addition to other loci involved in the vasopressin
synthesis pathway. Further work would be required to validate the PAM locus as the causal
gene and establish its role in this pathway.

It is well established that several autoimmune diseases, such as type 1 diabetes (T1D) and
systemic lupus erythematosus (SLE), share genetic susceptibility loci with RA (Eyre et al.
2010; Lettre & Rioux 2008; Orozco et al. 2011). This overlap is highlighted by the InnateDB
analysis with several disease specific pathways, such as autoimmune thyroid disease, T1D
and SLE, producing significant results (Table 24 & Table 25). Many of the disease specific
pathways are no longer significant once genes in the MHC region have been removed
which probably highlights that antigen presentation is common to all autoimmune diseases
rather than suggesting any overlap is due to specific MHC region genes.

5.2.2 Protein-protein Interaction


MHC class II molecules consist of two chains, one α and one β, encoded by separate genes
located close to each other in gene clusters (Figure 38). The Taverna workflow identified a
region containing two class II gene clusters (DR & DQ), the class II peptide loading genes
(DM & DO) and the class I antigen peptide transporters (TAP1 & TAP2). It was therefore not
surprising that these genes were present in the protein-protein interaction networks and
interacted with other members of their individual gene cluster (Figure 30 & Figure 31). For
example, the DR α chain, encoded by HLA-DRA, interacts with the DR β chain, encoded by
HLA-DRB1, to produce a DR class II cell surface receptor. Interactions are also seen
between HLA-DRA and other members of the HLA-DRB genes as well as the peptide loading
genes HLA-DMB and HLA-DOB. This was true for both the confirmed and expanded gene
lists with the inclusion of additional genes not affecting any of the interactions observed.

102
Figure 38 The HLA genetic complex located on chromosome 6 (Rodey 2000). Dark bands indicate functional
genes, light bands show non-active or pseudogenes and ψ represents an additional four HLA-DRB
pseudogenes: HLA-DRB6, 7, 8 and 9.

However, it is unlikely that all the genes identified by the Taverna workflow for this region
are causal and despite the obvious role of the MHC region genes in RA pathogenesis,
represent spurious interactions.

Once the MHC region genes are removed, all of these interactions are no longer present
and shows that the MHC region genes form a distinct pathway and that currently there is
no association to explain the link between this pathway and the other RA associations. This
may be because the DR class II molecule interacts with the immune system through
presentation of an antigenic peptide. It is widely thought that RA requires an antigenic
trigger to initiate the immune process (Ollier, Harrison, & Symmons 2001; Silman &
Pearson 2002) and it is not known whether this peptide is an endogenous single specific
peptide or a variety of closely related peptides showing molecular mimicry. Attempts have
been made to predict which peptide(s) this trigger is likely to be and despite potential
leads, they currently remain unsuccessful (Krause, Kamradt, & Burmester 1996; Silman &
Pearson 2002).

One interaction which disappears after removal of the MHC region genes involves the MHC
region gene NOTCH4 and the non-MHC gene RBPJ. Although NOTCH4 is located within the
MHC region, on the border between the MHC class II and class III loci, it represents one of
the few genes in this region with an exclusively non immune function. It does however
provide evidence of a link between the rs874040 (RBPJ) association and the NOTCH
signalling pathway discussed previously.

103
The addition of genes from the expanded gene list results in new interacting gene pairs,
expands existing interactions and introduces two new interacting gene sets (Figure 31).
Further evidence will be required to confirm the association of these interactions with RA
as many correspond to genes which have been identified by the same original SNP
association. For example, both the RAG1 and RAG2 genes were implicated by the same SNP
(rs540386), along with TRAF6. This association currently shows no evidence of multiple
independent effects and it will therefore be necessary to confirm which the causal gene is
before examining these interactions further.

The addition of the IL2RB and ICAM1 genes expanded the IL2 gene network seen in the
confirmed gene list analysis and further strengthens the effect of IL2 in the pathogenesis of
RA to regulate T cell proliferation. It has also been shown that the IL2 T cell response is
dependent on the interaction between the ICAM-1 and LFA-1 membrane proteins and
inhibition of either ICAM-1 or LFA-1 inhibits IL-2 induced proliferation (Vyth-Dreese et al.
1993).

Interestingly, the addition of one gene from the expanded gene list, CD247, revealed a
network containing three genes from the confirmed gene list not present in the initial
analysis. Three other genes from the expanded list extended the network further. The
CD247 gene encodes the T cell receptor zeta protein which forms part of the T cell receptor
complex and is important in T cell activation assisted by the co-stimulatory signal, CD2
(Verhagen et al. 1996) and signal transduction via PTPN22 (Wu et al. 2006). Variation in
either the function or regulation of these genes could disrupt T cell activation and
ultimately the propagation of an immune response.

An additional network was revealed primarily involving the CTLA4 and PTPN11 genes.
CTLA4 is involved in negative regulation of the T cell co-stimulatory signal and therefore
provides a strong candidate. PTPN11 encodes a protein tyrosine phosphatase involved in
signalling and was identified from the rs3184504 association. The Taverna workflow also
identified the SH2B3 gene, a key negative regulator of cytokine signalling (NCBI Gene ID:
10019), for this association which is also a strong candidate and therefore further work will
be required to determine the causal gene. The remaining two interactions involving ERBB2
and GRB7 were identified from the same initial association (rs2872507) and therefore may
not represent a true interaction.

104
Protein-protein interactions involving the TRAF6 gene from the expanded gene list link two
of the networks seen from the confirmed gene list analysis to form a network largely
explaining the interactions which occur during macrophage activation, inflammation, T cell
development, T cell proliferation and B cell activation (Figure 31 & Figure 39). Macrophage
activation by Th1 T cells requires two signals: an activating signal and a sensitising signal.
The activating signal is the cytokine IFN-γ and the sensitising signal is the interaction
between CD40 on the macrophage cell surface and its ligand, CD40L, on the Th1 T cell
surface (Travers, Walport, & Murphy 2008). The CD40 receptor does not have any intrinsic
signalling capabilities and instead signal transduction is mediated by interactions with TNF
receptor associated factor (TRAF) proteins (Werneburg et al. 2001). The CD40 receptor has
been shown to have two distinct cytoplasmic domains, cytC and cytN, which interact with
different TRAF proteins (Hu et al. 1994; McWhirter et al. 1999; Tsukamoto et al. 1999). The
TRAF1 gene, from the confirmed gene list, has been shown to interact with the CD40 cytC
domain (Hu et al. 1994; McWhirter et al. 1999) and the TRAF6 gene, from the expanded
gene list, interacts with the cytN domain (Tsukamoto et al. 1999). This allows the CD40
sensitisation signal to propagate causing macrophage activation (Figure 39 i).

The addition of the TRAF6 gene also reveals an interaction between the TRAF6 and the
interferon regulatory factor 5 (IRF5) genes. IRF5 encodes a transcription factor which
interacts with and is activated by TRAF6 (Takaoka et al. 2005). This causes expression of
proinflammatory cytokines, such as IL-6, IL-12 and TNF-α leading to inflammation (Figure
39 ii) (Takaoka et al. 2005). Any disturbance in the regulation of the IRF5 gene could lead to
increased proinflammatory cytokine production causing increased inflammation
characteristic of RA. Studies have shown that a five base pair insertion/deletion
polymorphism, lying in the genes promoter region, can lead to increased IRF5 mRNA
expression and has also been associated with other inflammatory diseases (Richez et al.
2010). IRF5 is also shown to interact with the XPO1 gene encoding the protein Exportin 1
which has been shown to regulate the sub-cellular localisation of IRF5 (Lin et al. 2005). The
XPO1 gene was identified by the rs13017599 (Table 11) association which also contains the
REL gene which encodes the transcription factor c-Rel a member of the REL/NFκB family.
The REL gene is a more likely candidate for this region and therefore further work would
need to be carried out to assess whether any multiple effects exist or which is the
functional gene.

105
Figure 39 Mechanisms which effect macrophage activation, inflammation, T cell development, T cell
proliferation and B cell activation. Macrophage activation (i) – release of IFN-γ and recognition of the
CD40 ligand (CD40L) by CD40 by macrophages leads to activation mediated by TRAF1/TRAF6
signalling. Inflammation (ii) – the IFR5 transcription factor interacts with TRAF6 leading to the
expression of pro-inflammatory cytokines causing inflammation. T cell development, differentiation
and proliferation (iii) – release of TNF-α and IL-1, by macrophages, leads to NFκB activation in Th1 T
cells causing T cell development, differentiation and proliferation. TNFAIP3 interacts with TRAF1 to
inhibit TNF-α/CD40L mediated NFκB activation and also interacts with TRAF6 to inhibit IL-1 mediated
NFκB activation. B cell activation (iv) – recognition of CD40L causes NFκB activation, mediated by
TRAF1 and TRAF6 leading to isotype switching, cytotoxic T cell priming and B cell survival.

106
Release of TNF-α by the macrophage also stimulates Th1 T cell NFκB activation (Figure 39
iii) through CD40L mediated TNF-α signalling (Song, Rothe, & Goeddel 1996). The cytokine
interleukin 1 (IL-1) is also produced by the activated macrophage causing IL-1 induced NFκB
activation in Th1 T cells (Figure 39 iii) (Heyninck & Beyaert 1999). NFκB is a transcription
factor which, upon activation in Th1 T cells, can lead to T cell differentiation, T cell
development, T cell proliferation mediated by IL-2 (Heyninck & Beyaert 1999) and
osteoclastogenesis (Clohisy et al. 2004). An important NFκB activation regulatory
mechanism in Th1 T cells involves TRAF1, TRAF6 and TNFAIP3 all shown to interact in the
expanded gene list interaction analysis. TRAF1 interacts with TNFAIP3 to cause inhibition of
TNF-α induced NFκB activation (Song, Rothe, & Goeddel 1996). Similarly, TRAF6 interacts
with TNFAIP3 to inhibit IL-1 induced NFκB activation (Song, Rothe, & Goeddel 1996). It is
therefore feasible that disruption in any of these genes, either their regulation or function,
could cause increased NFκB activation leading to a heightened immune response such as in
RA.

Th1 T cells can activate B cells which respond to the same antigen but require an additional
signal produced by the interaction between CD40 and its ligand (Figure 39 iv). As for
macrophage activation, the CD40 receptor requires TRAF proteins to signal this interaction.
TRAF1 and TRAF6 have both been shown to interact with CD40 to relay this signal (Hu et al.
1994; McWhirter et al. 1999; Tsukamoto et al. 1999) to the B cell to cause NFκB activation
(Guo et al. 2009; Konno et al. 2009; Tsukamoto et al. 1999). CD40 signalling rescues B cell
apoptosis, induces IgG isotype switching and causes cytotoxic T cell priming through
antigen presenting cells (APCs) (Tsukamoto et al. 1999).

To further support the role of genes important in NFκB activation, the canonical NFκB
pathway was significantly associated in the InnateDB analysis of the expanded gene list
(Table 25). This shows the importance of NFκB and genes involving NFκB activation in the
pathogenesis of RA.

Analysis of the extended networks was complex due to their size and it was therefore
decided to study ‘hubs’ of interaction. One gene, CD28, was identified from the confirmed
gene list analysis with more than ten interactions (Figure 32) and revealed two interacting
genes from the expanded gene list already identified in the previous analysis. This not only
provides further evidence for these interactions but shows that potentially new, as yet
unimplicated, genes could be discovered which may play a role in the pathogenesis of RA.

107
These include genes encoding the cell surface proteins CD4 found on CD4+ T cells and B7-1
and B7-2 found on APCs which provide the secondary signal required for T cell priming. The
remaining interactions reveal genes involved in signal transduction and kinase activity.

Analysis of the extended network produced from the expanded gene list revealed 21 genes
with more than ten interactions (Table 36). The largest of these involved the PTPN11 gene
as the central ‘hub’ and strengthens the case for this being the causal gene instead of the
implicated SH2B3 gene. Among the novel genes identified to interact with PTPN11 and
other members of the genes lists include genes involved in the JAK-STAT signalling
pathway, B cell receptor complex, several kinases and TNFRSF1A encoding the major TNF-α
receptor.

Additionally, PTPN11 is shown to interact with three genes, TYK2, IL6R and ERBB2, through
the IL6ST gene (Figure 33), implicated by Stahl et al. (2010) who identified an association
with the SNP rs6859219. Stahl et al. (Stahl et al. 2010) originally assigned the SNP to the
ANKDR55 gene as the IL6ST gene lies outside the region of LD, however, the gene encodes
an ankyrin repeat domain–containing gene of unknown function (Stahl et al. 2010) and
therefore does not offer a compelling candidate. In contrast, the IL6ST gene encodes an IL-
6 signal transducer which functions as part of the IL-6 receptor complex and represents a
much stronger biological candidate. The identification of the IL6ST gene in the extended
interaction analysis, coupled with the absence of the ANKDR55 gene, strengthens the case
for the involvement of the IL6ST gene in the pathogenesis of RA. Due to the distance of the
SNP from the gene (~150KB), this association may also represent a good candidate for long
range regulation of the IL6ST gene.

Interestingly, TRAF1, TRAF6 and CD40 all had more than ten interactions highlighting their
importance in several areas of the immune response as previously discussed. Additionally,
TRAF6 had the second largest number of interactions showing this is probably a key gene in
the pathogenesis of RA and potentially a strong therapeutic target.

Other genes showing a large number of interactions include PTPRC, an essential regulator
of T and B cell receptor signalling (NCBI Gene ID: 5788), CD247, encoding a subunit of the T
cell receptor complex, ERBB2, strengthening its potential role in RA pathogenesis and
IL2RB, highlighting the role of IL-2 in the immune response. Additionally, two genes not
found on either list, GRB2 and LCK, showed evidence of acting as hubs of interaction. GRB2
and LCK were both identified from the extended analysis of the confirmed gene list and this
further evidence could provide a role for both genes in the pathogenesis of RA.

108
5.2.3 Comparison with Previous Publications
As previously mentioned, pathway analysis of GWAS results have highlighted differences
between the analysis methods used, each producing some overlapping results but also
many differences. To validate the Taverna workflow and subsequent analyses, results were
compared to those of previously published data where possible. However, the analyses
performed in these publications all used results from an early GWAS in 2007 (Wellcome
Trust Case Control Consortium 2007) and since this over twenty new associations have
been discovered. Interestingly, none of these new loci were identified by any of the
previous publications and this is most probably due to the studies lack of power to detect
them. Many of these new associations are the result of large meta-analyses of GWAS and
although regarded as the largest and most comprehensive study at the time, the WTCCC
GWAS had limited power to detect associations with modest effect sizes (OR < 1.2)
(Wellcome Trust Case Control Consortium 2007). In addition, these studies also included
spurious singleton associations present in the original GWAS data set and not validated by
further analyses.

Genes overlapping all pathway analyses and present in the current confirmed gene list
(Table 37) represent three SNPs which are in the two most significantly associated regions
were the causal gene has been well established (PTPN22 and HLA-DRB1). Additionally many
of the genes identified map to the MHC region and most likely represent the strong
association seen for the HLA-DRB1 gene and are not all associated with the disease.
Although the previous publications failed to detect the new associations, conversely some
genes from the previous publications were identified by the extended protein-protein
interaction analysis and may represent potential candidates for future work.

A substantial proportion of the pathways identified by the previous publications (Table 4)


were involved in the immune response or immune related pathways (e.g. cell adhesion)
and the same is true for the analyses performed here. Immunological pathways enriched in
this analysis and previous publications include antigen processing and presentation, cell
adhesion molecules (CAMs), the JAK-STAT signalling pathway and T cell activation. This
highlights the importance of multiple immune system pathways in the pathogenesis of RA
and validates these findings.

There were, however, many non-immunological pathways identified by the previous


publications not identified by the analyses performed here. This may be due to the source
of the initial SNP data and therefore represent spurious results or could represent a

109
drawback with the way the workflow assigned genes and the pathway analyses carried out.
At present no conclusive evidence has shown these non-immune pathways are associated
with RA and the pathways identified by the work presented here are the result of using
known, robust associations. Additionally, no single non-immunological pathway was
common to more than one study suggesting that they are indeed spurious results.

5.3 Limitations
The main limitation of the pathway enrichment analyses involves the databases
themselves. It is clear from the results showing the proportion of genes annotated in each
database that a limiting factor with pathway analysis is down to the inclusion of genes
within them. InnateDB identified the most genes but failed to have pathways annotated for
the majority (77%) and therefore represented the database with the least annotation. The
Reactome database also performed badly, identifying a maximum of 15.79% of supplied
genes. DAVID successfully identified the highest proportion of genes but relies heavily on
other pathway databases and as such this proportion could drop as more resources are
considered.

Additional problems may be encountered due to the annotation of the pathways. If a


pathway is not represented, is incomplete or is not specific enough, this will affect the
databases potential to correctly assess the contribution of the genes to the pathway and
produce incorrect results. For example, if the pathway is incomplete, the number of genes
required to over-represent that pathway would be reduced and therefore less genes from
that pathway would lead to a significant association. Also if the pathway is not specific
enough, many more genes would be required to make it significant and therefore a true
association involving a smaller, more specific pathway would be lost.

The majority of pathways identified by the various pathway database analyses are immune
system or immune related pathways which could be due to a number of reasons. Firstly, RA
is an autoimmune disease and as such it would be expected that several immune related
pathways would be involved in the susceptibility and development of the disease.
Secondly, due to the importance of immunological pathways in disease, a large proportion
of the annotated pathways present in the databases represent ones involved in the
immune system causing an over-representation of immunological pathways. Finally,
immune genes, especially in the MHC region, are often found in clusters of similar function
or similar roles. For example, the MHC class II genes are found in close proximity and are all
involved in the processing and presentation of extracellular peptides. Additionally, CD28,

110
CTLA4 and ICOS map to the same region on chromosome 2q33.2 and are all involved in the
T cell co-stimulatory signal as discussed previously. However, one gene could account for
multiple association signals but would be represented by two of more functionally related
genes, skewing results towards that pathway instead of the true causal loci. It will therefore
be necessary to pinpoint the causal gene for each observed association and re-run the
analyses to fully determine the contribution of each loci to each pathway.

Further limitations of the analyses presented here involve independent effects and the
effect size of each association. No attempts have been made to account for any
independent effects or to correct enrichment analyses for the effect size observed for the
associations and as such, this could alter the pathways observed as significant.

5.4 Further Work


The Taverna workflow could be extended to overcome the problem of missing genes,
caused either by genome assembly conversion errors or where no genes are found in the
region identified by the Taverna workflow. This could be achieved by allowing an additional
attribute to the ‘AssociatedSNP’ node for the input which could store a default gene name.
When the workflow fails to identify any genes for a supplied SNP, the workflow could use
this default gene name to populate the ‘AssociatedGenes’ node in the output.

The effect of long range regulatory components could be investigated by utilising data from
the ENCyclopaedia Of DNA Elements (ENCODE) project (ENCODE Project Consortium 2004)
which has generated data from multiple experiments to identify functional elements in the
human genome. This data is publicly available and can be easily evaluated by using the
ASSIMILATOR programme (Martin, Barton, & Eyre 2011).

To evaluate the effect of multiple genes mapping to the same association signal, the
analyses could be conducted by selecting one gene per associated SNP. This would
eliminate the problem but would re-introduce the bias caused by allowing researchers to
implicate genes themselves. Analysis of gene lists produced by permutations of all genes
implicated by the workflow would overcome this but would be computationally intensive
and difficult to analyse further.

The pathway analyses have implicated several pathways involving many novel candidates
for RA pathogenesis, for example, the PAM and ICOS genes. These novel loci would require
further validation to ensure their association with RA. Furthermore, genes identified by the
protein-protein interaction analysis would also require validation but present many new

111
loci and point to ‘better’ candidate regions, for example, PTPN11, which could be involved
in RA pathogenesis. If confirmed, these new loci could assist in predicting disease outcome,
provide new therapeutic targets or aid treatment selection and response prediction for RA.
Additionally, this technique could be applied to other diseases, leading to a better
understanding of disease aetiology and/or treatment options.

5.5 Summary
The Taverna workflow developed allows researchers to utilise a simple, unbiased, robust
method to assign genes to SNP association signals. The workflow has been designed to
allow flexibility for multiple parameters such as modifying LD cutoff and distance to search
for LD, allowing researchers to customise their experimental aims. It has successfully
integrated data from multiple genomic resources, allowing a complex workflow to run
seamlessly and the concept of building an XML document to record all the information
obtained has been fundamental in accomplishing this. The Taverna Workbench allows
researchers to run workflows regardless of platform (Windows, Linux etc) and this
customised SNP workflow will maintain this ability as no dependence is placed on local
resources.

The Taverna workflow performed well and although the genes identified are similar to
those assigned by researchers, it has identified potentially interesting candidates, such as
PTPN11, in an unbiased manner. Coupled with the pathway analysis results and protein-
protein interactions, researchers will be able to ascertain the probable causal gene and
conduct focused experiments to confirm them. For RA, the analyses performed here has
moved the focus of several SNPs from ‘biologically plausible’ to other genes, such as
rs6859219, which now has strong evidence to implicate the IL6ST gene, as well as the
originally associated ANKRD55 gene. Additionally, multiple SNPs initially thought to map to
the same loci may well implicate multiple loci which could act independently towards the
pathogenesis of the disease. For example, the CD28 gene was implicated by rs1980422 and
the CTLA4 gene was implicated by both rs231735 and rs3087243 but there is now evidence
to suggest the association signals could correspond to the CD28, ICOS and CTLA4 genes.

While the pathway analyses have not identified anything particularly unusual they did
highlight multiple pathways which explain many aspects of RA aetiology and confirm the
association with several known loci such as CD40, IRF5, IL2 and IL2RA, all involved at
different points throughout the immune response. Pathway and interaction analyses
highlight the importance of co-stimulatory molecules, such as CD28, CTLA4 and CAMs and

112
lead to candidate genes that are central to these pathways and may well therefore be
involved in disease.

However, the results show that no one pathway database is the ideal source and results
must be combined from several databases to build an accurate picture of the pathways and
mechanisms involved in the disease. Additionally, the use of a well characterised SNP set
has appeared to remove some spurious results refining the pathways implicated. Whilst
studying these pathways, the protein-protein interactions and relating these to the
immune response, has uncovered genes involved in several potential regulatory points
which could also be good candidates as therapeutic targets. For example, two mechanisms,
IL1 and TNF-α mediated NFκB activation, exist to drive T cell proliferation, differentiation
and development and two related genes, TRAF1 and TRAF6, have each been shown to
regulate a distinct mechanism in conjunction with another associated locus (Figure 39).

Further work is required to validate these results and develop the analyses further to
include effect sizes as discussed but this work shows that pathway analysis has the
potential to uncover pathways involved in disease aetiology. This ability will improve as
more associations are identified and regions are narrowed down to implicate a single locus,
refining the pathways to only the genes categorically associated with the disease.

In conclusion, we have developed a robust workflow using the Taverna workbench which
successfully maps genes to GWAS association signals in an unbiased manner. Using these
genes, pathway and interaction analysis of RA loci has confirmed the ability to identify
pathways involved in disease. Additionally, while further refinement and validation is
necessary, it has also identified novel pathways and implicated additional genes which may
contribute to RA susceptibility or provide therapeutic targets.

113
6 References
1000 Genomes. (2008). International Consortium Announces the 1000 Genomes Project,
http://www.1000genomes.org/docs/1000Genomes-NewsRelease.pdf, Date accessed 2010

1000 Genomes. (2010). The 1000 Genomes Project, http://www.1000genomes.org, Date


accessed 2010

Affymetrix, Inc. (2003). Affymetrix Launches Mapping 10K Array and CustomSeq(TM)
Resequencing Array, http://phx.corporate-ir.net/preview/phoenix.zhtml?c=116408&p=irol-
newsArticle&ID=434182&highlight=, Date accessed 2010

Affymetrix, Inc. (2004). Press Release: New Affymetrix Arrays with 100,000 SNPs Available
for Early Technology Access Customers, http://phx.corporate-
ir.net/preview/phoenix.zhtml?c=116408&p=irol-newsArticle&ID=511452&highlight=, Date
accessed 2010

Affymetrix, Inc. (2005). Press Release: New Affymetrix Microarray Set Genotypes More
Than 500,000 SNPs in a Single Experiment; GeneChip(R) Human Mapping 500K Array Set
Enables Researchers to Perform Highly Detailed Whole-Genome Association Studies for the
First Time, http://phx.corporate-ir.net/preview/phoenix.zhtml?c=116408&p=irol-
newsArticle&ID=761893&highlight=, Date accessed 2010

Affymetrix, Inc. (2007a). Press Release: Affymetrix Redefines Association Studies with
Introduction of SNP 5.0 Array, http://phx.corporate-
ir.net/preview/phoenix.zhtml?c=116408&p=irol-newsArticle&ID=948207&highlight=, Date
accessed 2010a

Affymetrix, Inc. (2007b). Press Release: Affymetrix Sets New Standard in Genotyping With
Genome-Wide Human SNP Array 6.0, http://phx.corporate-
ir.net/preview/phoenix.zhtml?c=116408&p=irol-newsArticle&ID=1004567&highlight=,
Date accessed 2010b

Affymetrix, Inc. (2009). Genome-Wide Human SNP Array 6.0 Data Sheet,
http://media.affymetrix.com/support/technical/datasheets/genomewide_snp6_datasheet.
pdf, Date accessed 2010

Agrawal, S., Misra, R., & Aggarwal, A. (2006) Autoantibodies in rheumatoid arthritis:
association with severity of disease in established RA. Clinical Rheumatology, 26(2), p. 201.

Altmüller, J. et al. (2001) Genomewide Scans of Complex Human Diseases: True Linkage Is
Hard to Find. The American Journal of Human Genetics, 69(5), pp. 936-950.

Ardlie, K. G., Kruglyak, L., & Seielstad, M. (2002) Patterns of linkage disequilibrium in the
human genome. Nat.Rev.Genet., 3(4), pp. 299-309.

Ashburner, M. et al. (2000) Gene ontology: tool for the unification of biology. The Gene
Ontology Consortium. Nat.Genet., 25(1), pp. 25-29.

Baranzini, S. E. et al. (2009) Pathway and network-based analysis of genome-wide


association studies in multiple sclerosis. Hum.Mol.Genet., 18(11), pp. 2078-2090.

Barrett, J. C. & Cardon, L. R. (2006) Evaluating coverage of genome-wide association


studies. Nat.Genet., 38(6), pp. 659-662.

114
Barrett, J. C. et al. (2009) Genome-wide association study and meta-analysis find that over
40 loci affect risk of type 1 diabetes. Nat.Genet., Epub ahead of print.

Barsky, A. et al. (2007) Cerebral: a Cytoscape plugin for layout of and interaction with
biological networks using subcellular localization annotation. Bioinformatics., 23(8), pp.
1040-1042.

Baxevanis, A. D. (2008) Searching NCBI databases using Entrez. Curr.Protoc.Bioinformatics.,


Chapter 1(Unit 1.3).

Becker, K. G. (2004) The common variants/multiple disease hypothesis of common complex


genetic disorders. Med.Hypotheses, 62(2), pp. 309-317.

Begovich, A. B. et al. (2004) A missense single-nucleotide polymorphism in a gene encoding


a protein tyrosine phosphatase (PTPN22) is associated with rheumatoid arthritis.
Am.J.Hum.Genet., 75(2), pp. 330-337.

Benson, D. A. et al. (2010) GenBank. Nucleic Acids Res., 38(Database issue), p. D46-D51.

Birney, E. et al. (2007) Identification and analysis of functional elements in 1% of the


human genome by the ENCODE pilot project. Nature, 447(7146), pp. 799-816.

Bodmer, W. & Bonilla, C. (2008) Common and rare variants in multifactorial susceptibility to
common diseases. Nat.Genet., 40(6), pp. 695-701.

Brennan, F. M. et al. (1989) Inhibitory effect of TNF alpha antibodies on synovial cell
interleukin-1 production in rheumatoid arthritis. Lancet, 2(8657), pp. 244-247.

Buchwald, M. et al. (1986) Linkage of cystic fibrosis to the pro alpha 2(I) collagen gene,
COL1A2, on chromosome 7. Cytogenet.Cell Genet., 41(4), pp. 234-239.

Cantor, R. M., Lange, K., & Sinsheimer, J. S. (2010) Prioritizing GWAS results: A review of
statistical methods and recommendations for their application. Am.J.Hum.Genet., 86(1),
pp. 6-22.

Carbon, S. et al. (2009) AmiGO: online access to ontology and annotation data.
Bioinformatics., 25(2), pp. 288-289.

Carlson, C. S. et al. (2003) Additional SNPs and linkage-disequilibrium analyses are


necessary for whole-genome association studies in humans. Nat.Genet., 33(4), pp. 518-521.

Carrasquillo, M. M. et al. (2002) Genome-wide association study and mouse model identify
interaction between RET and EDNRB pathways in Hirschsprung disease. Nat.Genet., 32(2),
pp. 237-244.

Chikanza, I. C. & Grossman, A. S. (1998) Hypothalamic-pituitary-mediated


immunomodulation: arginine vasopressin is a neuroendocrine immune mediator.
Br.J.Rheumatol., 37(2), pp. 131-136.

Choy, E. H. & Panayi, G. S. (2001) Cytokine pathways and joint inflammation in rheumatoid
arthritis. N.Engl.J.Med., 344(12), pp. 907-916.

Clohisy, J. C. et al. (2004) NF-kB signaling blockade abolishes implant particle-induced


osteoclastogenesis. J.Orthop.Res., 22(1), pp. 13-20.

115
Cordell, H. J. (2002) Epistasis: what it means, what it doesn't mean, and statistical methods
to detect it in humans. Hum.Mol.Genet., 11(20), pp. 2463-2468.

Cox, N. J. et al. (2001) Seven regions of the genome show evidence of linkage to type 1
diabetes in a consensus analysis of 767 multiplex families. Am.J.Hum.Genet., 69(4), pp. 820-
830.

Croiset, G., Heijnen, C., & de Wied, D. (1990) Passive Avoidance Behavior, Vasopressin and
the Immune System. Neuroendocrinology, 51(2), p. 156.

De Jager, P. L. et al. (2009) Meta-analysis of genome scans and replication identify CD6,
IRF8 and TNFRSF1A as new multiple sclerosis susceptibility loci. Nat.Genet., 41(7), pp. 776-
782.

Dennis, G., Jr. et al. (2003) DAVID: Database for Annotation, Visualization, and Integrated
Discovery. Genome Biol., 4(5), p. 3.

Dickson, S. P. et al. (2010) Rare variants create synthetic genome-wide associations.


PLoS.Biol., 8(1), p. e1000294.

Duerr, R. H. et al. (2006) A genome-wide association study identifies IL23R as an


inflammatory bowel disease gene. Science, 314(5804), pp. 1461-1463.

Elbers, C. C. et al. (2009) Using genome-wide pathway analysis to unravel the etiology of
complex diseases. Genet.Epidemiol., 33(5), pp. 419-431.

Eleftherohorinou, H. et al. (2009) Pathway analysis of GWAS provides new insights into
genetic susceptibility to 3 inflammatory diseases. PLoS.One., 4(11), p. e8068.

Elsby, L. M. et al. (2010) Functional evaluation of TNFAIP3 (A20) in rheumatoid arthritis.


Clin.Exp.Rheumatol., 28(5), pp. 708-714.

ENCODE Project Consortium (2004) The ENCODE (ENCyclopedia Of DNA Elements) Project.
Science, 306(5696), pp. 636-640.

Eyre, S. et al. (2010) Overlapping genetic susceptibility variants between three autoimmune
disorders: rheumatoid arthritis, type 1 diabetes and coeliac disease. Arthritis Res.Ther.,
12(5), p. R175.

Fan, J. B. et al. (2006) Illumina universal bead arrays. Methods Enzymol., 410 pp. 57-73.

Feldman, R. & Sanger, J. (2007) The text mining handbook : advanced approaches in
analyzing unstructured data,Cambridge : Cambridge University Press.

Feldmann, M., Brennan, F. M., & Maini, R. N. (1996) Rheumatoid arthritis. Cell, 85(3), pp.
307-310.

Fellay, J. et al. (2010) ITPA gene variants protect against anaemia in patients treated for
chronic hepatitis C. Nature, 464(7287), pp. 405-408.

Frazer, K. A. et al. (2007) A second generation human haplotype map of over 3.1 million
SNPs. Nature, 449(7164), pp. 851-861.

Gardy, J. L. et al. (2009) Enabling a systems biology approach to immunology: focus on


innate immunity. Trends Immunol., 30(6), pp. 249-262.

116
Gitschier, J. et al. (1985) Genetic mapping and diagnosis of haemophilia A achieved through
a BclI polymorphism in the factor VIII gene. Nature, 314(6013), pp. 738-740.

Gregersen, P. K. et al. (2009) REL, encoding a member of the NF-kappaB family of


transcription factors, is a newly defined risk locus for rheumatoid arthritis. Nat.Genet.,
41(7), pp. 820-823.

Gunderson, K. L. et al. (2005) A genome-wide scalable SNP genotyping assay using


microarray technology. Nat.Genet., 37(5), pp. 549-554.

Guo, F. et al. (2009) TRAF1 is involved in the classical NF-kappaB activation and CD30-
induced alternative activity in Hodgkin's lymphoma cells. Mol.Immunol., 46(13), pp. 2441-
2448.

Heyninck, K. & Beyaert, R. (1999) The cytokine-inducible zinc finger protein A20 inhibits IL-
1-induced NF-kappaB activation at the level of TRAF6. FEBS Lett., 442(2-3), pp. 147-150.

Hill, W. G. & Robertson, A. (1968) The effects of inbreeding at loci with heterozygote
advantage. Genetics, 60(3), pp. 615-628.

Hindorff, L. A., Junkins, H. A., Hall, P. N., Mehta, J. P., and Manolio, T. A. (2010). A Catalog of
Published Genome-Wide Association Studies, http://www.genome.gov/gwastudies/, Date
accessed 2010

Hirschhorn, J. N. & Daly, M. J. (2005) Genome-wide association studies for common


diseases and complex traits. Nat.Rev.Genet., 6(2), pp. 95-108.

Hoggart, C. J. et al. (2008) Simultaneous analysis of all SNPs in genome-wide and re-
sequencing association studies. PLoS.Genet., 4(7), p. e1000130.

Howard, T. D. et al. (2002) Gene-gene interaction in asthma: IL4RA and IL13 in a Dutch
population with asthma. Am.J.Hum.Genet., 70(1), pp. 230-236.

Hu, H. M. et al. (1994) A novel RING finger protein interacts with the cytoplasmic domain of
CD40. J.Biol.Chem., 269(48), pp. 30069-30072.

Huang, d. W., Sherman, B. T., & Lempicki, R. A. (2009) Systematic and integrative analysis of
large gene lists using DAVID bioinformatics resources. Nat.Protoc., 4(1), pp. 44-57.

Hubbard, T. J. et al. (2009) Ensembl 2009. Nucleic Acids Res., 37(Database issue), p. D690-
D697.

Hugot, J. P. et al. (2001) Association of NOD2 leucine-rich repeat variants with susceptibility
to Crohn's disease. Nature, 411(6837), pp. 599-603.

Hull, D. et al. (2006) Taverna: a tool for building and running workflows of services. Nucleic
Acids Res., 34(Web Server issue), p. W729-W732.

Illumina, Inc. (2005). Press Release: Illumina Initiates Shipment Of Whole-Genome


Genotyping Beadchips, http://investor.illumina.com/phoenix.zhtml?c=121127&p=irol-
newsArticle&ID=857568&highlight=infinium, Date accessed 2010

Illumina, Inc. (2006a). Press Release: Illumina Introduces HumanHap550 Genotyping


BeadChip; New HumanHap BeadChips Expand Genomic Coverage with Industry-Leading
Performance, http://investor.illumina.com/phoenix.zhtml?c=121127&p=irol-
newsArticle&ID=835968&highlight=, Date accessed 2010a

117
Illumina, Inc. (2006b). Press Release: Illumina Now Shipping HumanHap300 BeadChips for
Genome-Wide Disease Association Studies; Infinium(TM) Assay Sets New Standard for
Performance and Data Quality,
http://investor.illumina.com/phoenix.zhtml?c=121127&p=irol-
newsArticle&ID=802368&highlight=, Date accessed 2010b

Illumina, Inc. (2007). Press Release: Illumina Commences Shipment of the Human1M
BeadChip, http://investor.illumina.com/phoenix.zhtml?c=121127&p=irol-
newsArticle&ID=1021707&highlight=, Date accessed 2010

Illumina, Inc. (2008a). Infinium HD Assay,


http://www.illumina.com/technology/infinium_hd_assay.ilmn, Date accessed 2010a

Illumina, Inc. (2008b). Press Release: Illumina Introduces the Infinium(R) High-Density (HD)
Human1M-Duo and Human610-Quad BeadChips,
http://investor.illumina.com/phoenix.zhtml?c=121127&p=irol-
newsArticle&ID=1092856&highlight=, Date accessed 2010b

Illumina, Inc. (2009a). Illumina Product Guide 2009,


http://www.illumina.com/documents/products/guides/2009_product_guide.pdf, Date
accessed 2010a

Illumina, Inc. (2009b). Press Release: Illumina Introduces the Infinium(R) HD HumanOmni1-
Quad BeadChip, http://investor.illumina.com/phoenix.zhtml?c=121127&p=irol-
newsArticle_print&ID=1289046&highlight=, Date accessed 2010b

Illumina, Inc. (2010a). Illumina multi-sample array formats,


http://www.illumina.com/technology/beadarray_technology.ilmn, Date accessed 2010a

Illumina, Inc. (2010b). Whole-Genome Genotyping Product Roadmap,


http://www.illumina.com/applications/gwas.ilmn, Date accessed 6/2010b

InnateDB. (2011). InnateDB: Systems Biology of the Innate Immune Response - Statistics,
http://www.innatedb.ca/statistics.jsp, Date accessed 6/5/2011

Jorde, L. B. (2000) Linkage disequilibrium and the search for complex disease genes.
Genome Res., 10(10), pp. 1435-1444.

Jorgenson, E. & Witte, J. S. (2006) Coverage and power in genomewide association studies.
Am.J.Hum.Genet., 78(5), pp. 884-888.

Kanehisa Laboratories. (2010). KEGG - Current Statistics,


http://www.genome.jp/kegg/docs/statistics.html, Date accessed 2/6/2010

Kanehisa, M. & Goto, S. (2000) KEGG: kyoto encyclopedia of genes and genomes. Nucleic
Acids Res., 28(1), pp. 27-30.

Kanehisa, M. et al. (2010) KEGG for representation and analysis of molecular networks
involving diseases and drugs. Nucleic Acids Res., 38(Database issue), p. D355-D360.

Kanehisa, M. et al. (2006) From genomics to chemical genomics: new developments in


KEGG. Nucleic Acids Res., 34(Database issue), p. D354-D357.

Kent, W. J. et al. (2002) The human genome browser at UCSC. Genome Res., 12(6), pp. 996-
1006.

118
Konno, H. et al. (2009) TRAF6 establishes innate immune responses by activating NF-
kappaB and IRF7 upon sensing cytosolic viral RNA and DNA. PLoS.One., 4(5), p. e5674.

Krause, A., Kamradt, T., & Burmester, G. R. (1996) Potential infectious agents in the
induction of arthritides. Curr.Opin.Rheumatol., 8(3), pp. 203-209.

Laky, K. & Fowlkes, B. J. (2008) Notch signaling in CD4 and CD8 T cell development.
Curr.Opin.Immunol., 20(2), pp. 197-202.

Lander, E. S. et al. (2001) Initial sequencing and analysis of the human genome. Nature,
409(6822), pp. 860-921.

Lettre, G. & Rioux, J. D. (2008) Autoimmune diseases: insights from genome-wide


association studies. Hum.Mol.Genet., 17(R2), p. R116-R121.

Lewontin, R. C. (1964) The Interaction of Selection and Linkage. I. General Considerations;


Heterotic Models. Genetics, 49(1), pp. 49-67.

Li, N. & Stephens, M. (2003) Modeling linkage disequilibrium and identifying recombination
hotspots using single-nucleotide polymorphism data. Genetics, 165(4), pp. 2213-2233.

Libioulle, C. et al. (2007) Novel Crohn disease locus identified by genome-wide association
maps to a gene desert on 5p13.1 and modulates expression of PTGER4. PLoS.Genet., 3(4),
p. e58.

Lin, R. et al. (2005) A CRM1-dependent nuclear export pathway is involved in the regulation
of IRF-5 subcellular localization. J.Biol.Chem., 280(4), pp. 3088-3095.

Lipshutz, R. J. et al. (1999) High density synthetic oligonucleotide arrays. Nat.Genet., 21(1
Suppl), pp. 20-24.

Lipsky, P. E. et al. (2000) Infliximab and methotrexate in the treatment of rheumatoid


arthritis. Anti-Tumor Necrosis Factor Trial in Rheumatoid Arthritis with Concomitant
Therapy Study Group. N.Engl.J.Med., 343(22), pp. 1594-1602.

Luo, L. et al. (2010) Genome-wide gene and pathway analysis. Eur.J.Hum.Genet., Epub
ahead of print.

Marchini, J. et al. (2007) A new multipoint method for genome-wide association studies by
imputation of genotypes. Nat.Genet., 39(7), pp. 906-913.

Martin, P., Barton, A., & Eyre, S. (2011) ASSIMILATOR: a new tool to inform selection of
associated genetic variants for functional studies. Bioinformatics, 27(1), p. 144.

Mathivanan, S. et al. (2006) An evaluation of human protein-protein interaction data in the


public domain. BMC.Bioinformatics., 7 Suppl 5 p. S19.

Matthews, L. et al. (2009) Reactome knowledgebase of human biological pathways and


processes. Nucleic Acids Res., 37(Database issue), p. D619-D622.

McCarthy, M. I. et al. (2008) Genome-wide association studies for complex traits:


consensus, uncertainty and challenges. Nat.Rev.Genet., 9(5), pp. 356-369.

McInnes, I. B. & Schett, G. (2007) Cytokines in the pathogenesis of rheumatoid arthritis.


Nat.Rev.Immunol., 7(6), pp. 429-442.

119
McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University Baltimore MD
and National Center for Biotechnology Information, National Library of Medicine Bethesda
MD. (2010). Online Mendelian Inheritance in Man, OMIM (TM),
http://www.ncbi.nlm.nih.gov/omim/, Date accessed 3/6/2010

McVean, G. A. et al. (2004) The fine-scale structure of recombination rate variation in the
human genome. Science, 304(5670), pp. 581-584.

McWhirter, S. M. et al. (1999) Crystallographic analysis of CD40 recognition and signaling


by human TRAF2. Proc.Natl.Acad.Sci.U.S.A, 96(15), pp. 8408-8413.

Mootha, V. K. et al. (2003) PGC-1alpha-responsive genes involved in oxidative


phosphorylation are coordinately downregulated in human diabetes. Nat.Genet., 34(3), pp.
267-273.

Morris, A. P. (2006) A flexible Bayesian framework for modeling haplotype association with
disease, allowing for dominance effects of the underlying causative variants.
Am.J.Hum.Genet., 79(4), pp. 679-694.

Nistico, L. et al. (1996) The CTLA-4 gene region of chromosome 2q33 is linked to, and
associated with, type 1 diabetes. Belgian Diabetes Registry. Hum.Mol.Genet., 5(7), pp.
1075-1080.

Ogura, Y. et al. (2001) A frameshift mutation in NOD2 associated with susceptibility to


Crohn's disease. Nature, 411(6837), pp. 603-606.

Oliphant, A. et al. (2002) BeadArray technology: enabling an accurate, cost-effective


approach to high-throughput genotyping. Biotechniques, Suppl pp. 56-1.

Ollier, W. E., Harrison, B., & Symmons, D. (2001) What is the natural history of rheumatoid
arthritis? Best.Pract.Res.Clin.Rheumatol., 15(1), pp. 27-48.

Orozco, G. et al. (2011) Study of the common genetic background for rheumatoid arthritis
and systemic lupus erythematosus. Ann.Rheum.Dis., 70(3), pp. 463-468.

Orozco, G. et al. (2009) Combined effects of three independent SNPs greatly increase the
risk estimate for RA at 6q23. Hum.Mol.Genet., 18(14), pp. 2693-2699.

Peng, G. et al. (2010) Gene and pathway-based second-wave analysis of genome-wide


association studies. Eur.J.Hum.Genet., 18(1), pp. 111-117.

Peri, S. et al. (2003) Development of human protein reference database as an initial


platform for approaching systems biology in humans. Genome Res., 13(10), pp. 2363-2371.

Petersson, M. et al. (2006) Effects of arginine-vasopressin and parathyroid hormone-


related protein (1-34) on cell proliferation and production of YKL-40 in cultured
chondrocytes from patients with rheumatoid arthritis and osteoarthritis. Osteoarthritis and
Cartilage, 14(7), p. 652.

Plant, D. et al. (2010) Investigation of potential non-HLA rheumatoid arthritis susceptibility


loci in a European cohort increases the evidence for nine markers. Annals of the Rheumatic
Diseases, 69(8), p. 1548.

Pomerantz, M. M. et al. (2009) The 8q24 cancer risk variant rs6983267 shows long-range
interaction with MYC in colorectal cancer. Nat.Genet., 41(8), pp. 882-884.

120
Price, A. L. et al. (2006) Principal components analysis corrects for stratification in genome-
wide association studies. Nat.Genet., 38(8), pp. 904-909.

Pritchard, J. K. (2001) Are rare variants responsible for susceptibility to complex diseases?
Am.J.Hum.Genet., 69(1), pp. 124-137.

Raychaudhuri, S. et al. (2009a) Identifying relationships among genomic disease regions:


predicting genes at pathogenic SNP associations and rare deletions. PLoS.Genet., 5(6), p.
e1000534.

Raychaudhuri, S. et al. (2008) Common variants at CD40 and other loci confer risk of
rheumatoid arthritis. Nat.Genet., 40(10), pp. 1216-1223.

Raychaudhuri, S. et al. (2009b) Genetic variants at CD28, PRDM1 and CD2/CD58 are
associated with rheumatoid arthritis risk. Nat.Genet., 41(12), pp. 1313-1318.

Reactome. (2011). Reactome - Stats, http://www.reactome.org/stats.html, Date accessed


5/2011

Reich, D. E. & Lander, E. S. (2001) On the allelic spectrum of human disease. Trends Genet.,
17(9), pp. 502-510.

Remmers, E. F. et al. (2007) STAT4 and the risk of rheumatoid arthritis and systemic lupus
erythematosus. N.Engl.J.Med., 357(10), pp. 977-986.

Richez, C. et al. (2010) Role for interferon regulatory factors in autoimmunity. Joint Bone
Spine, 77(6), pp. 525-531.

Rioux, J. D. et al. (2001) Genetic variation in the 5q31 cytokine gene cluster confers
susceptibility to Crohn disease. Nat.Genet., 29(2), pp. 223-228.

Rodey, G. (2000) HLA beyond tears : introduction to human histocompatibility, Durango,


CO, De Novo : Distributed by Pel-Freez.

Royer, L. et al. (2008) Unraveling protein networks with power graph analysis.
PLoS.Comput.Biol., 4(7), p. e1000108.

Sayers, E. W. et al. (2010) Database resources of the National Center for Biotechnology
Information. Nucleic Acids Res., 38(Database issue), pp. D5-16.

Schork, N. J. et al. (2009) Common vs. rare allele hypotheses for complex diseases.
Curr.Opin.Genet.Dev., 19(3), pp. 212-219.

Shannon, P. et al. (2003) Cytoscape: a software environment for integrated models of


biomolecular interaction networks. Genome Res., 13(11), pp. 2498-2504.

Sherry, S. T. et al. (2001) dbSNP: the NCBI database of genetic variation. Nucleic Acids Res.,
29(1), pp. 308-311.

Silman, A. J. & Pearson, J. E. (2002) Epidemiology and genetics of rheumatoid arthritis.


Arthritis Res., 4 Suppl 3 p. S265-S272.

Siontis, K. C., Patsopoulos, N. A., & Ioannidis, J. P. (2010) Replication of past candidate loci
for common diseases and phenotypes in 100 genome-wide association studies.
Eur.J.Hum.Genet., 18(7), pp. 832-837.

121
Small, K. M. et al. (2002) Synergistic polymorphisms of beta1- and alpha2C-adrenergic
receptors and the risk of congestive heart failure. N.Engl.J.Med., 347(15), pp. 1135-1142.

Smedley, D. et al. (2009) BioMart--biological queries made easy. BMC.Genomics, 10 p. 22.

Song, H. Y., Rothe, M., & Goeddel, D. V. (1996) The tumor necrosis factor-inducible zinc
finger protein A20 interacts with TRAF1/TRAF2 and inhibits NF-kappaB activation.
Proc.Natl.Acad.Sci.U.S.A, 93(13), pp. 6721-6725.

Stahl, E. A. et al. (2010) Genome-wide association study meta-analysis identifies seven new
rheumatoid arthritis risk loci. Nat.Genet., 42(6), pp. 508-514.

Steemers, F. J. et al. (2006) Whole-genome genotyping with the single-base extension


assay. Nat.Methods, 3(1), pp. 31-33.

Stefansson, H. et al. (2002) Neuregulin 1 and susceptibility to schizophrenia.


Am.J.Hum.Genet., 71(4), pp. 877-892.

Stoll, M. et al. (2004) Genetic variation in DLG5 is associated with inflammatory bowel
disease. Nat.Genet., 36(5), pp. 476-480.

Strand, V., Kimberly, R., & Isaacs, J. D. (2007) Biologic therapies in rheumatology: lessons
learned, future directions. Nat.Rev.Drug Discov., 6(1), pp. 75-92.

Subramanian, A. et al. (2005) Gene set enrichment analysis: a knowledge-based approach


for interpreting genome-wide expression profiles. Proc.Natl.Acad.Sci.U.S.A, 102(43), pp.
15545-15550.

Takaoka, A. et al. (2005) Integral role of IRF-5 in the gene induction programme activated
by Toll-like receptors. Nature, 434(7030), pp. 243-249.

Taylor, J. et al. (2007) Using galaxy to perform large-scale interactive data analyses.
Curr.Protoc.Bioinformatics., Chapter 10(Unit 10.5).

Teo, Y. Y., Small, K. S., & Kwiatkowski, D. P. (2010) Methodological challenges of genome-
wide association analysis in Africa. Nat.Rev.Genet., 11(2), pp. 149-160.

The International HapMap Consortium (2003) The International HapMap Project. Nature,
426(6968), pp. 789-796.

The International HapMap Consortium (2005) A haplotype map of the human genome.
Nature, 437(7063), pp. 1299-1320.

Thomas, P. D. et al. (2003) PANTHER: a library of protein families and subfamilies indexed
by function. Genome Res., 13(9), pp. 2129-2141.

Torkamani, A., Topol, E. J., & Schork, N. J. (2008) Pathway analysis of seven common
diseases assessed by genome-wide association. Genomics, 92(5), pp. 265-272.

Torres, B. A. & Johnson, H. M. (1988) Arginine vasopressin (AVP) replacement of helper cell
requirement in IFN-gamma production. Evidence for a novel AVP receptor on mouse
lymphocytes. The Journal of Immunology, 140(7), pp. 2179-2183.

Travers, P., Walport, M., & Murphy, K. P. (2008) Janeway's immunobiology, 7th ed. /
Kenneth Murphy, Paul Travers, Mark Walport. edn.New York : Garland Science.

122
Tsukamoto, N. et al. (1999) Two differently regulated nuclear factor kappaB activation
pathways triggered by the cytoplasmic tail of CD40. Proc.Natl.Acad.Sci.U.S.A, 96(4), pp.
1234-1239.

van, K. C. & Banchereau, J. (2000) CD40-CD40 ligand. J.Leukoc.Biol., 67(1), pp. 2-17.

Vastrik, I. et al. (2007) Reactome: a knowledge base of biologic pathways and processes.
Genome Biol., 8(3), p. R39.

Verhagen, A. M. et al. (1996) Differential interaction of the CD2 extracellular and


intracellular domains with the tyrosine phosphatase CD45 and the zeta chain of the
TCR/CD3/zeta complex. Eur.J.Immunol., 26(12), pp. 2841-2849.

Vyth-Dreese, F. A. et al. (1993) Role of LFA-1/ICAM-1 in interleukin-2-stimulated


lymphocyte proliferation. Eur.J.Immunol., 23(12), pp. 3292-3299.

Wang, K., Li, M., & Bucan, M. (2007) Pathway-Based Approaches for Analysis of
Genomewide Association Studies. Am.J.Hum.Genet., 81(6).

Wang, W. Y. et al. (2005) Genome-wide association studies: theoretical and practical


concerns. Nat.Rev.Genet., 6(2), pp. 109-118.

Wang, Y. et al. (2007) MMDB: annotating protein sequences with Entrez's 3D-structure
database. Nucleic Acids Res., 35(Database issue), p. D298-D300.

Wellcome Trust Case Control Consortium (2007) Genome-wide association study of 14,000
cases of seven common diseases and 3,000 shared controls. Nature, 447(7145), pp. 661-
678.

Werneburg, B. et al. (2001) Molecular Characterization of CD40 Signaling Intermediates.


Journal of Biological Chemistry, 276(46).

Wu, J. et al. (2006) Identification of substrates of human protein-tyrosine phosphatase


PTPN22. J.Biol.Chem., 281(16), pp. 11002-11010.

Zhao, Z. et al. (2006) Circular chromosome conformation capture (4C) uncovers extensive
networks of epigenetically regulated intra- and interchromosomal interactions. Nat.Genet.,
38(11), pp. 1341-1347.

Zollner, S. & Pritchard, J. K. (2005) Coalescent-based association mapping and fine mapping
of complex trait loci. Genetics, 169(2), pp. 1071-1092.

123
7 Supplementary Results
7.1 Disease Analysis – RA
7.1.1 Pathway Analysis

7.1.1.1 Gene Ontology – BiNGO


The number of genes from the confirmed gene list with no BiNGO annotations reduced
slightly from 138 to 116 when genes in the MHC region were excluded (Supplementary
Table 1); however, due to the decrease in overall gene numbers (198) this had little impact
on the proportion of genes missing GO annotations (58.6%). The number of genes from the
expanded list without MHC genes, with no GO annotations nearly doubled compared the
confirmed list (221). However, as the total number of genes followed the same pattern, the
proportion of genes missing GO annotations reduced by approximately 3% to 55.5%. GO
annotation results are summarised in Supplementary Table 2.

Supplementary Table 1 Genes unrecognised by BiNGO.


Confirmed Confirmed no HLA Expanded Expanded no HLA
5S_RRNA.269 5S_RRNA.269 5S_RRNA.269 5S_RRNA.269
5S_RRNA.315 5S_RRNA.315 5S_RRNA.315 5S_RRNA.315
7SK.63 7SK.63 5S_RRNA.508 5S_RRNA.508
AC010138.3 AC010138.3 7SK.117 7SK.117
AC010733.1 AC010733.1 7SK.19 7SK.19
AC010733.2 AC010733.2 7SK.47 7SK.47
AC010733.4 AC010733.4 7SK.58 7SK.58
AC010733.5 AC010733.5 7SK.63 7SK.63
AC010733.7 AC010733.7 AC002454.1 AC002454.1
AC010733.8 AC010733.8 AC002979.1 AC002979.1
AC011005.1 AC011005.1 AC003029.4 AC003029.4
AC011005.2 AC011005.2 AC004128.1 AC004128.1
AC011245.1 AC011245.1 AC010138.3 AC010138.3
AC012370.2 AC012370.2 AC010733.1 AC010733.1
AC012370.3 AC012370.3 AC010733.2 AC010733.2
AC016727.3 AC016727.3 AC010733.4 AC010733.4
AC016747.3 AC016747.3 AC010733.5 AC010733.5
AC016894.1 AC016894.1 AC010733.7 AC010733.7
AC022506.1 AC022506.1 AC010733.8 AC010733.8
AC025165.1 AC025165.1 AC011005.1 AC011005.1
AC025165.8 AC025165.8 AC011005.2 AC011005.2
AC025594.1 AC025594.1 AC011245.1 AC011245.1
AC025594.2 AC025594.2 AC012370.2 AC012370.2
AC053545.3 AC053545.3 AC012370.3 AC012370.3
AC067945.4 AC067945.4 AC016727.3 AC016727.3
AC074391.1 AC074391.1 AC016747.3 AC016747.3
AC092667.2 AC092667.2 AC016894.1 AC016894.1
AC097714.1 AC097714.1 AC022506.1 AC022506.1
AC098479.1 AC098479.1 AC025165.1 AC025165.1
AC104782.3 AC104782.3 AC025165.8 AC025165.8
AC125238.1 AC125238.1 AC025594.1 AC025594.1
AC125238.2 AC125238.2 AC025594.2 AC025594.2
AC125238.3 AC125238.3 AC053545.3 AC053545.3
AC125238.4 AC125238.4 AC061999.1 AC061999.1
AL121935.1 AL121935.1 AC067945.4 AC067945.4
AL133458.1 AL133458.1 AC074391.1 AC074391.1
AL137068.1 AL137068.1 AC079199.1 AC079199.1
AL355794.1 AL355794.1 AC079199.2 AC079199.2
AL589645.1 AL589645.1 AC087491.2 AC087491.2
AL645941.1 ANKRD55 AC090644.1 AC090644.1
AL662789.1 ARHGEF25 AC090844.1 AC090844.1
AL662796.1 C1ORF137 AC091491.2 AC091491.2

124
AL662796.2 C5ORF30 AC092667.2 AC092667.2
AL669918.1 C6ORF99 AC097533.1 AC097533.1
AL669918.2 C8ORF12 AC097714.1 AC097714.1
AL713966.1 C9ORF144 AC098479.1 AC098479.1
ANKRD55 CTC-425H14.1 AC104782.3 AC104782.3
ARHGEF25 CTC-503K11.2 AC109460.1 AC109460.1
C1ORF137 FAM167A AC109460.2 AC109460.2
C5ORF30 GAPDHP64 AC125238.1 AC125238.1
C6ORF99 GLULP4 AC125238.2 AC125238.2
C8ORF12 KIAA1841 AC125238.3 AC125238.3
C9ORF144 MIR320B1 AC125238.4 AC125238.4
CTC-425H14.1 MIR616 AC137055.1 AC137055.1
CTC-503K11.2 NCRNA00208 AC137055.2 AC137055.2
FAM167A NEFHP1 AC138894.1 AC138894.1
GAPDHP64 RP11-10J5.1 AC138894.2 AC138894.2
GLULP4 RP1-111C20.3 AC138894.3 AC138894.3
HLA-DQB2 RP11-128A6.2 AC145285.1 AC145285.1
HLA-DQB3 RP11-128A6.3 AC145285.2 AC145285.2
HLA-DRB6 RP11-13P5.1 AC145285.3 AC145285.3
HLA-DRB9 RP11-148O21.2 ADAM1 ADAM1
HLA-Z RP11-148O21.3 AL022314.1 AL022314.1
HNRNPA1P2 RP11-148O21.4 AL121935.1 AL121935.1
KIAA1841 RP11-195F19.10 AL133458.1 AL133458.1
MIR320B1 RP11-240M16.1 AL137068.1 AL137068.1
MIR616 RP11-240M16.2 AL137145.1 AL137145.1
NCRNA00208 RP11-286H14.1 AL355794.1 AL355794.1
NEFHP1 RP11-286H14.2 AL589645.1 AL589645.1
RP11-10J5.1 RP11-286H14.3 AL645941.1 ANKRD55
RP1-111C20.3 RP11-286H14.4 AL662789.1 AP000550.1
RP11-128A6.2 RP11-309L24.6 AL662796.1 AP000552.1
RP11-128A6.3 RP11-324H7.1 AL662796.2 AP000552.2
RP11-13P5.1 RP11-356I2.2 AL669918.1 AP000552.3
RP11-148O21.2 RP11-356I2.4 AL669918.2 AP000553.1
RP11-148O21.3 RP11-359I18.1 AL713966.1 AP000557.1
RP11-148O21.4 RP11-392A14.1 ANKRD55 ARHGEF25
RP11-195F19.10 RP11-392A14.7 AP000550.1 BCRP6
RP11-240M16.1 RP11-414H17.2 AP000552.1 C11ORF74
RP11-240M16.2 RP11-414H17.6 AP000552.2 C12ORF47
RP11-286H14.1 RP11-414H17.7 AP000552.3 C1ORF137
RP11-286H14.2 RP11-456N14.2 AP000553.1 C22ORF33
RP11-286H14.3 RP11-473L1.1 AP000557.1 C5ORF30
RP11-286H14.4 RP11-475O23.2 ARHGEF25 C6ORF99
RP11-309L24.6 RP11-475O23.3 BCRP6 C8ORF12
RP11-324H7.1 RP11-477J21.5 C11ORF74 C9ORF144
RP11-356I2.2 RP11-514O12.4 C12ORF47 CCDC116
RP11-356I2.4 RP11-525G7.2 C1ORF137 CTC-425H14.1
RP11-359I18.1 RP11-536K7.5 C22ORF33 CTC-503K11.2
RP11-392A14.1 RP11-571M6.1 C5ORF30 CTD-2119L1.1
RP11-392A14.7 RP1-167A14.2 C6ORF99 EIF3C
RP11-414H17.2 RP11-80H18.3 C8ORF12 FAM109A
RP11-414H17.6 RP11-95M15.1 C9ORF144 FAM133B
RP11-414H17.7 RP4-590F24.2 CCDC116 FAM167A
RP11-456N14.2 RP4-730K3.3 CTC-425H14.1 GAPDHP64
RP11-473L1.1 RP5-1073O3.2 CTC-503K11.2 GLULP4
RP11-475O23.2 RP5-1073O3.5 CTD-2119L1.1 KB-1183D5.11
RP11-475O23.3 RP5-1073O3.7 EIF3C KB-1183D5.14
RP11-477J21.5 RPL13P2 FAM109A KB-1183D5.15
RP11-514O12.4 RPL17P22 FAM133B KB-1183D5.16
RP11-525G7.2 RSPH3 FAM167A KB-1183D5.9
RP11-536K7.5 SNORA14.1 GAPDHP64 KB-1592A4.13
RP11-571M6.1 SNORA70B GLULP4 KB-1592A4.14
RP1-167A14.2 SNOU13.218 HLA-DQB2 KB-1592A4.15
RP11-80H18.3 SNOU13.70 HLA-DQB3 KIAA1841
RP11-95M15.1 TCP10L2 HLA-DRB6 LL22NC01-81G9.3
RP4-590F24.2 TPI1P2 HLA-DRB9 MIR1181
RP4-730K3.3 U4.19 HLA-Z MIR130B
RP5-1073O3.2 U6.1269 HNRNPA1P2 MIR181A1
RP5-1073O3.5 U6.1320 KB-1183D5.11 MIR181B1
RP5-1073O3.7 U6.532 KB-1183D5.14 MIR301B
RPL13P2 U6.534 KB-1183D5.15 MIR320B1
RPL17P22 U6.690 KB-1183D5.16 MIR616

125
RSPH3 U6.922 KB-1183D5.9 NCRNA00208
SNORA14.1 YR211F11.2 KB-1592A4.13 NCRNA00281
SNORA70B YWHAZP6 KB-1592A4.14 NEFHP1
SNOU13.218 KB-1592A4.15 POM121L7
SNOU13.70 KIAA1841 POM121L8P
TCP10L2 LL22NC01-81G9.3 RIMBP3B
TPI1P2 MIR1181 RIMBP3C
U1.110 MIR130B RP11-104L21.2
U4.19 MIR181A1 RP11-10J5.1
U6.1269 MIR181B1 RP1-111C20.3
U6.1320 MIR301B RP11-128A6.2
U6.532 MIR320B1 RP11-128A6.3
U6.534 MIR616 RP11-1348G14.1
U6.690 NCRNA00208 RP11-1348G14.2
U6.922 NCRNA00281 RP11-13P5.1
U6.939 NEFHP1 RP11-148O21.2
XXBAC-BPG154L12.4 POM121L7 RP11-148O21.3
XXBAC-BPG181M17.5 POM121L8P RP11-148O21.4
XXBAC-BPG246D15.8 RIMBP3B RP11-16L9.1
XXBAC-BPG246D15.9 RIMBP3C RP11-16L9.2
XXBAC-BPG254F23.5 RP11-104L21.2 RP11-16L9.3
XXBAC-BPG254F23.6 RP11-10J5.1 RP11-16L9.4
XXBAC-BPG254F23.7 RP1-111C20.3 RP11-195F19.10
YR211F11.2 RP11-128A6.2 RP11-240M16.1
YWHAZP6 RP11-128A6.3 RP11-240M16.2
RP11-1348G14.1 RP11-286H14.1
RP11-1348G14.2 RP11-286H14.2
RP11-13P5.1 RP11-286H14.3
RP11-148O21.2 RP11-286H14.4
RP11-148O21.3 RP11-309L24.6
RP11-148O21.4 RP11-31E23.1
RP11-16L9.1 RP11-324H7.1
RP11-16L9.2 RP11-330O11.2
RP11-16L9.3 RP11-350G8.5
RP11-16L9.4 RP11-350G8.7
RP11-195F19.10 RP11-356I2.2
RP11-240M16.1 RP11-356I2.4
RP11-240M16.2 RP11-359I18.1
RP11-286H14.1 RP11-382E9.1
RP11-286H14.2 RP11-392A14.1
RP11-286H14.3 RP11-392A14.7
RP11-286H14.4 RP11-414H17.2
RP11-309L24.6 RP11-414H17.6
RP11-31E23.1 RP11-414H17.7
RP11-324H7.1 RP11-456N14.2
RP11-330O11.2 RP11-473L1.1
RP11-350G8.5 RP11-475O23.2
RP11-350G8.7 RP11-475O23.3
RP11-356I2.2 RP11-477J21.5
RP11-356I2.4 RP11-514O12.4
RP11-359I18.1 RP1-151B14.6
RP11-382E9.1 RP11-525G7.2
RP11-392A14.1 RP11-536K7.5
RP11-392A14.7 RP11-553K8.2
RP11-414H17.2 RP11-553K8.3
RP11-414H17.6 RP11-553K8.5
RP11-414H17.7 RP11-571M6.1
RP11-456N14.2 RP11-57A19.2
RP11-473L1.1 RP11-57A19.3
RP11-475O23.2 RP11-61L14.2
RP11-475O23.3 RP11-61L14.6
RP11-477J21.5 RP1-167A14.2
RP11-514O12.4 RP11-80H18.3
RP1-151B14.6 RP11-95M15.1
RP11-525G7.2 RP3-462E2.1
RP11-536K7.5 RP3-521E19.1
RP11-553K8.2 RP4-590F24.2
RP11-553K8.3 RP4-730K3.3
RP11-553K8.5 RP5-1073O3.2
RP11-571M6.1 RP5-1073O3.5
RP11-57A19.2 RP5-1073O3.7

126
RP11-57A19.3 RP5-1170K4.7
RP11-61L14.2 RPL13P2
RP11-61L14.6 RPL17P22
RP1-167A14.2 RSPH3
RP11-80H18.3 SCARNA17.1
RP11-95M15.1 SCARNA17.2
RP3-462E2.1 SCARNA18.3
RP3-521E19.1 SCARNA18.4
RP4-590F24.2 SNORA14.1
RP4-730K3.3 SNORA43.4
RP5-1073O3.2 SNORA70B
RP5-1073O3.5 SNOU13.193
RP5-1073O3.7 SNOU13.207
RP5-1170K4.7 SNOU13.218
RPL13P2 SNOU13.36
RPL17P22 SNOU13.456
RSPH3 SNOU13.466
SCARNA17.1 SNOU13.70
SCARNA17.2 TCP10L2
SCARNA18.3 TPI1P2
SCARNA18.4 U4.19
SNORA14.1 U6.1202
SNORA43.4 U6.1269
SNORA70B U6.1320
SNOU13.193 U6.453
SNOU13.207 U6.532
SNOU13.218 U6.534
SNOU13.36 U6.690
SNOU13.456 U6.922
SNOU13.466 U7.53
SNOU13.70 Y_RNA.127
TCP10L2 Y_RNA.202
TPI1P2 Y_RNA.294
U1.110 Y_RNA.817
U4.19 YDJC
U6.1202 YR211F11.2
U6.1269 YWHAZP6
U6.1320
U6.453
U6.532
U6.534
U6.690
U6.922
U6.939
U7.53
XXBAC-BPG154L12.4
XXBAC-BPG181M17.5
XXBAC-BPG246D15.8
XXBAC-BPG246D15.9
XXBAC-BPG254F23.5
XXBAC-BPG254F23.6
XXBAC-BPG254F23.7
Y_RNA.127
Y_RNA.202
Y_RNA.294
Y_RNA.817
YDJC
YR211F11.2
YWHAZP6

Supplementary Table 2 Number of genes without GO annotations by gene list.


Gene List Number of Genes Without GO Annotations
With HLA Without HLA
Confirmed 138 58.7% 116 58.6%
Expanded 243 55.9% 221 55.9%

127
Full results from the confirmed and expanded gene lists are shown in Supplementary Table
3 and Supplementary Table 5 respectively. The number of over-represented GO terms
reduced from 94 to 58 when genes in the MHC region were excluded and almost
exclusively represented GO terms associated with the regulation of proliferation and
activation of immune cells (Supplementary Table 4). A similar effect was seen when genes
from the MHC region were excluded from the expanded gene list, with 129 over-
represented GO terms mostly involved in the regulation of proliferation and activation of
immune cells (Supplementary Table 6). Network diagrams for all four gene lists can be
found in Supplementary Figure 1 - Supplementary Figure 4.

128
Supplementary Table 3 Significant BiNGO results for the confirmed gene list. Each row shows the significant GO ID, p-value, p-value after correction for multiple testing, the number of
genes observed and in total for the gene list and all human genes, GO term description and genes identified.
GO-ID p-value corr x n X N Description Genes in test set
p-value
42613 7.34E-16 1.15E-12 8 13 97 17784 MHC class II protein complex HLA-DQB1, HLA-DRB1, HLA-DRB5, HLA-DMB, HLA-DOB, HLA-DQA2, HLA-DQA1, HLA-DRA
2504 1.36E-14 8.33E-12 8 17 97 17784 antigen processing and presentation of peptide HLA-DQB1, HLA-DRB1, HLA-DRB5, HLA-DMB, HLA-DOB, HLA-DQA2, HLA-DQA1, HLA-DRA
or polysaccharide antigen via MHC class II
19882 1.60E-14 8.33E-12 11 59 97 17784 antigen processing and presentation HLA-DQB1, HLA-DRB1, TAP2, TAP1, HLA-DRB5, HLA-DMB, HLA-DOB, HLA-DQA2, HLA-DQA1,
PSMB9, HLA-DRA
32395 1.87E-12 7.29E-10 6 9 97 17784 MHC class II receptor activity HLA-DQB1, HLA-DRB1, HLA-DOB, HLA-DQA2, HLA-DQA1, HLA-DRA
42611 1.22E-11 3.80E-09 8 35 97 17784 MHC protein complex HLA-DQB1, HLA-DRB1, HLA-DRB5, HLA-DMB, HLA-DOB, HLA-DQA2, HLA-DQA1, HLA-DRA
2376 1.01E-09 2.63E-07 23 947 97 17784 immune system process HLA-DQB1, IL2RA, HLA-DRB1, C5, CTLA4, CD40, HLA-DMB, IL21, HLA-DQA2, HLA-DQA1, PSMB9,
CCR6, CCL21, TAP2, ICOS, TAP1, NOTCH4, CD2, HLA-DRB5, HLA-DOB, IL2, HLA-DRA, CD28
5765 2.87E-08 6.41E-06 8 89 97 17784 lysosomal membrane HLA-DQB1, HLA-DRB1, HLA-DRB5, HLA-DMB, HLA-DOB, HLA-DQA2, HLA-DQA1, HLA-DRA
6955 3.59E-08 7.01E-06 17 618 97 17784 immune response HLA-DQB1, IL2RA, HLA-DRB1, C5, CTLA4, HLA-DMB, IL21, HLA-DQA2, HLA-DQA1, CCR6, CCL21,
ICOS, HLA-DRB5, HLA-DOB, HLA-DRA, CD28, IL2
5774 2.45E-07 4.25E-05 8 117 97 17784 vacuolar membrane HLA-DQB1, HLA-DRB1, HLA-DRB5, HLA-DMB, HLA-DOB, HLA-DQA2, HLA-DQA1, HLA-DRA
44437 3.83E-07 5.98E-05 8 124 97 17784 vacuolar part HLA-DQB1, HLA-DRB1, HLA-DRB5, HLA-DMB, HLA-DOB, HLA-DQA2, HLA-DQA1, HLA-DRA
45589 6.27E-07 8.32E-05 3 4 97 17784 regulation of regulatory T cell differentiation ICOS, CTLA4, IL2
45619 6.53E-07 8.32E-05 6 57 97 17784 regulation of lymphocyte differentiation IL2RA, ICOS, CD2, CTLA4, IL21, IL2
129

2683 6.92E-07 8.32E-05 7 92 97 17784 negative regulation of immune system process IL2RA, ICOS, TAP1, C5, CTLA4, PTPN22, IL2
50670 8.02E-07 8.95E-05 7 94 97 17784 regulation of lymphocyte proliferation IL2RA, ICOS, CTLA4, CD40, IL21, IL2, CD28
32944 8.62E-07 8.97E-05 7 95 97 17784 regulation of mononuclear cell proliferation IL2RA, ICOS, CTLA4, CD40, IL21, IL2, CD28
70663 9.25E-07 9.03E-05 7 96 97 17784 regulation of leukocyte proliferation IL2RA, ICOS, CTLA4, CD40, IL21, IL2, CD28
42129 1.72E-06 1.58E-04 6 67 97 17784 regulation of T cell proliferation IL2RA, ICOS, CTLA4, IL21, IL2, CD28
51249 2.40E-06 2.09E-04 8 158 97 17784 regulation of lymphocyte activation IL2RA, ICOS, CD2, CTLA4, CD40, IL21, IL2, CD28
50863 4.63E-06 3.81E-04 7 122 97 17784 regulation of T cell activation IL2RA, ICOS, CD2, CTLA4, IL21, IL2, CD28
323 5.49E-06 4.08E-04 9 235 97 17784 lytic vacuole HLA-DQB1, HLA-DRB1, RAB14, HLA-DRB5, HLA-DMB, HLA-DOB, HLA-DQA2, HLA-DQA1, HLA-DRA
5764 5.49E-06 4.08E-04 9 235 97 17784 lysosome HLA-DQB1, HLA-DRB1, RAB14, HLA-DRB5, HLA-DMB, HLA-DOB, HLA-DQA2, HLA-DQA1, HLA-DRA
2694 6.33E-06 4.49E-04 8 180 97 17784 regulation of leukocyte activation IL2RA, ICOS, CD2, CTLA4, CD40, IL21, IL2, CD28
45580 6.86E-06 4.66E-04 5 49 97 17784 regulation of T cell differentiation IL2RA, ICOS, CD2, CTLA4, IL2
50865 1.02E-05 6.61E-04 8 192 97 17784 regulation of cell activation IL2RA, ICOS, CD2, CTLA4, CD40, IL21, IL2, CD28
10008 1.52E-05 9.14E-04 8 203 97 17784 endosome membrane HLA-DQB1, HLA-DRB1, HLA-DRB5, HLA-DMB, HLA-DOB, HLA-DQA2, HLA-DQA1, HLA-DRA
44440 1.52E-05 9.14E-04 8 203 97 17784 endosomal part HLA-DQB1, HLA-DRB1, HLA-DRB5, HLA-DMB, HLA-DOB, HLA-DQA2, HLA-DQA1, HLA-DRA
5773 1.77E-05 1.03E-03 9 272 97 17784 vacuole HLA-DQB1, HLA-DRB1, RAB14, HLA-DRB5, HLA-DMB, HLA-DOB, HLA-DQA2, HLA-DQA1, HLA-DRA
2682 2.01E-05 1.12E-03 11 425 97 17784 regulation of immune system process IL2RA, ICOS, TAP1, C5, CD2, CTLA4, PTPN22, CD40, IL21, IL2, CD28
50671 2.57E-05 1.23E-03 5 64 97 17784 positive regulation of lymphocyte proliferation IL2RA, CD40, IL21, IL2, CD28
70664 2.66E-05 1.23E-03 4 32 97 17784 negative regulation of leukocyte proliferation IL2RA, ICOS, CTLA4, IL2
50672 2.66E-05 1.23E-03 4 32 97 17784 negative regulation of lymphocyte proliferation IL2RA, ICOS, CTLA4, IL2
32945 2.66E-05 1.23E-03 4 32 97 17784 negative regulation of mononuclear cell IL2RA, ICOS, CTLA4, IL2
proliferation
32946 2.77E-05 1.23E-03 5 65 97 17784 positive regulation of mononuclear cell IL2RA, CD40, IL21, IL2, CD28
proliferation
45590 2.94E-05 1.23E-03 2 2 97 17784 negative regulation of regulatory T cell ICOS, CTLA4
differentiation
46013 2.94E-05 1.23E-03 2 2 97 17784 regulation of T cell homeostatic proliferation IL2RA, IL2
42825 2.94E-05 1.23E-03 2 2 97 17784 TAP complex TAP2, TAP1
46967 2.94E-05 1.23E-03 2 2 97 17784 cytosol to ER transport TAP2, TAP1
70665 2.99E-05 1.23E-03 5 66 97 17784 positive regulation of leukocyte proliferation IL2RA, CD40, IL21, IL2, CD28
2696 4.12E-05 1.65E-03 6 116 97 17784 positive regulation of leukocyte activation IL2RA, CD2, CD40, IL21, IL2, CD28
50776 4.47E-05 1.75E-03 8 236 97 17784 regulation of immune response ICOS, TAP1, C5, CTLA4, CD40, IL21, IL2, CD28
50867 5.22E-05 1.99E-03 6 121 97 17784 positive regulation of cell activation IL2RA, CD2, CD40, IL21, IL2, CD28
48585 6.85E-05 2.55E-03 6 127 97 17784 negative regulation of response to stimulus IL2RA, ICOS, TAP1, C5, CTLA4, IL2
42102 7.94E-05 2.88E-03 4 42 97 17784 positive regulation of T cell proliferation IL2RA, IL21, IL2, CD28
5134 8.80E-05 2.99E-03 2 3 97 17784 interleukin-2 receptor binding IL21, IL2
46977 8.80E-05 2.99E-03 2 3 97 17784 TAP binding TAP2, TAP1
46978 8.80E-05 2.99E-03 2 3 97 17784 TAP1 binding TAP2, TAP1
9897 1.08E-04 3.60E-03 6 138 97 17784 external side of plasma membrane IL2RA, ICOS, CD2, CTLA4, CD40, CD28
2706 1.58E-04 5.14E-03 4 50 97 17784 regulation of lymphocyte mediated immunity TAP1, CD40, IL21, IL2
30581 1.75E-04 5.17E-03 2 4 97 17784 symbiont intracellular protein transport in host TAP2, TAP1
51708 1.75E-04 5.17E-03 2 4 97 17784 intracellular protein transport in other TAP2, TAP1
130

organism involved in symbiotic interaction


42824 1.75E-04 5.17E-03 2 4 97 17784 MHC class I peptide loading complex TAP2, TAP1
46719 1.75E-04 5.17E-03 2 4 97 17784 regulation of viral protein levels in host cell TAP2, TAP1
19060 1.75E-04 5.17E-03 2 4 97 17784 intracellular transport of viral proteins in host TAP2, TAP1
cell
51250 1.84E-04 5.33E-03 4 52 97 17784 negative regulation of lymphocyte activation IL2RA, ICOS, CTLA4, IL2
5768 2.35E-04 6.67E-03 9 381 97 17784 endosome HLA-DQB1, HLA-DRB1, RAB14, HLA-DRB5, HLA-DMB, HLA-DOB, HLA-DQA2, HLA-DQA1, HLA-DRA
2695 2.63E-04 7.34E-03 4 57 97 17784 negative regulation of leukocyte activation IL2RA, ICOS, CTLA4, IL2
51251 2.73E-04 7.36E-03 5 105 97 17784 positive regulation of lymphocyte activation IL2RA, CD40, IL21, IL2, CD28
2697 2.73E-04 7.36E-03 5 105 97 17784 regulation of immune effector process TAP1, CD40, IL21, IL2, CD28
48304 2.91E-04 7.71E-03 2 5 97 17784 positive regulation of isotype switching to IgG CD40, IL2
isotypes
2703 3.01E-04 7.83E-03 4 59 97 17784 regulation of leukocyte mediated immunity TAP1, CD40, IL21, IL2
50866 3.87E-04 9.91E-03 4 63 97 17784 negative regulation of cell activation IL2RA, ICOS, CTLA4, IL2
23052 4.17E-04 1.04E-02 31 3130 97 17784 signaling TRAF1, HLA-DRB1, BLK, C5, PTPN22, GLI1, STAT4, TAGAP, CCL21, GSN, CD2, SPRED2, AGAP2,
ARHGAP9, CD28, MAGI3, IL2RA, KIF5A, DTX3, SLC12A5, CD40, IL21, FLNB, DDIT3, CCR6, HIPK1,
NOTCH4, RAB14, RBPJ, GPR31, IL2
42130 4.18E-04 1.04E-02 3 27 97 17784 negative regulation of T cell proliferation IL2RA, ICOS, CTLA4
2483 4.35E-04 1.05E-02 2 6 97 17784 antigen processing and presentation of TAP2, TAP1
endogenous peptide antigen
19885 4.35E-04 1.05E-02 2 6 97 17784 antigen processing and presentation of TAP2, TAP1
endogenous peptide antigen via MHC class I
50777 5.19E-04 1.23E-02 3 29 97 17784 negative regulation of immune response ICOS, TAP1, CTLA4
42175 5.86E-04 1.37E-02 11 624 97 17784 nuclear membrane-endoplasmic reticulum HLA-DQB1, XPO1, HLA-DRB1, TAP2, TAP1, RAB14, HLA-DRB5, HLA-DQA2, HLA-DQA1, OS9, HLA-
network DRA
48302 6.07E-04 1.40E-02 2 7 97 17784 regulation of isotype switching to IgG isotypes CD40, IL2
2684 6.19E-04 1.40E-02 7 265 97 17784 positive regulation of immune system process IL2RA, C5, CD2, CD40, IL21, IL2, CD28
45621 6.33E-04 1.41E-02 3 31 97 17784 positive regulation of lymphocyte IL2RA, IL21, IL2
differentiation
45830 8.07E-04 1.75E-02 2 8 97 17784 positive regulation of isotype switching CD40, IL2
19883 8.07E-04 1.75E-02 2 8 97 17784 antigen processing and presentation of TAP2, TAP1
endogenous antigen
50870 8.30E-04 1.78E-02 4 77 97 17784 positive regulation of T cell activation IL2RA, IL21, IL2, CD28
8284 9.02E-04 1.90E-02 9 459 97 17784 positive regulation of cell proliferation IL2RA, HIPK1, FGFR1OP, CD40, IL21, RBPJ, IL2, CD28, GLI1
44459 9.50E-04 1.98E-02 22 1999 97 17784 plasma membrane part HLA-DQB1, MAGI3, IL2RA, HLA-DRB1, SLC12A5, C5, CTLA4, CD40, HLA-DMB, HLA-DQA2, HLA-
DQA1, KCTD6, CCR6, ICOS, NOTCH4, CD2, RAB14, HLA-DRB5, HLA-DOB, GPR31, CD28, HLA-DRA
45911 1.03E-03 2.07E-02 2 9 97 17784 positive regulation of DNA recombination CD40, IL2
51023 1.03E-03 2.07E-02 2 9 97 17784 regulation of immunoglobulin secretion CD40, IL2
45334 1.03E-03 2.07E-02 2 9 97 17784 clathrin-coated endocytic vesicle ICOS, CTLA4
15197 1.29E-03 2.51E-02 2 10 97 17784 peptide transporter activity TAP2, TAP1
42104 1.29E-03 2.51E-02 2 10 97 17784 positive regulation of activated T cell IL2RA, IL2
131

proliferation
2700 1.34E-03 2.59E-02 3 40 97 17784 regulation of production of molecular mediator CD40, IL21, IL2
of immune response
31348 1.55E-03 2.95E-02 3 42 97 17784 negative regulation of defense response IL2RA, TAP1, IL2
50868 1.66E-03 3.12E-02 3 43 97 17784 negative regulation of T cell activation IL2RA, ICOS, CTLA4
5813 1.74E-03 3.23E-02 5 158 97 17784 centrosome HIPK1, FGFR1OP, CEP110, TNFAIP3, DCTN2
5515 1.86E-03 3.38E-02 59 8121 97 17784 protein binding XPO1, FAM3D, PTPN22, GLI1, OS9, GSN, SPRED2, AGAP2, LONRF2, ARHGAP9, MAGI3, KIF5A,
CEP110, CD40, IL21, FLNB, DDIT3, DCTN2, CCR6, HIPK1, INHBE, FGFR1OP, RAB14, TNFAIP3,
TRAF1, PAM, PFKFB3, BLK, C5, PXK, RPP14, STAT4, FAM107A, REL, CCL21, TAP2, ICOS, TAP1,
CD2, PEX13, TNPO3, MARS, CD28, IL2RA, DTX3, SLC12A5, CTLA4, RSBN1, PSMB8, KCTD6, PHF19,
NOTCH4, AHSA2, RBPJ, AP4B1, KIAA1045, PIP4K2C, IL2, RBM17
45191 1.88E-03 3.38E-02 2 12 97 17784 regulation of isotype switching CD40, IL2
31347 1.88E-03 3.38E-02 5 161 97 17784 regulation of defense response IL2RA, TAP1, IL21, IL2, CD28
45581 2.21E-03 3.88E-02 2 13 97 17784 negative regulation of T cell differentiation ICOS, CTLA4
46006 2.21E-03 3.88E-02 2 13 97 17784 regulation of activated T cell proliferation IL2RA, IL2
48583 2.27E-03 3.95E-02 9 525 97 17784 regulation of response to stimulus IL2RA, ICOS, TAP1, C5, CTLA4, CD40, IL21, IL2, CD28
32880 2.51E-03 4.31E-02 5 172 97 17784 regulation of protein localization PAM, C5, RAB14, CD40, IL2
7219 2.56E-03 4.35E-02 3 50 97 17784 Notch signaling pathway DTX3, NOTCH4, RBPJ
9986 2.63E-03 4.42E-02 7 341 97 17784 cell surface IL2RA, ICOS, NOTCH4, CD2, CTLA4, CD40, CD28
42108 2.71E-03 4.50E-02 3 51 97 17784 positive regulation of cytokine biosynthetic REL, IL21, CD28
process
Supplementary Table 4 Significant BiNGO results for the confirmed gene list after exclusion of MHC region genes. Each row shows the significant GO ID, p-value, p-value after correction
for multiple testing, the number of genes observed and in total for the gene list and all human genes, GO term description and genes identified.
GO-ID p-value corr x n X N Description Genes in test set
p-value
45619 2.40E-07 1.06E-04 6 57 82 17787 regulation of lymphocyte differentiation IL2RA, ICOS, CD2, CTLA4, IL21, IL2
50670 2.53E-07 1.06E-04 7 94 82 17787 regulation of lymphocyte proliferation IL2RA, ICOS, CTLA4, CD40, IL21, IL2, CD28
32944 2.72E-07 1.06E-04 7 95 82 17787 regulation of mononuclear cell proliferation IL2RA, ICOS, CTLA4, CD40, IL21, IL2, CD28
70663 2.92E-07 1.06E-04 7 96 82 17787 regulation of leukocyte proliferation IL2RA, ICOS, CTLA4, CD40, IL21, IL2, CD28
45589 3.77E-07 1.09E-04 3 4 82 17787 regulation of regulatory T cell differentiation ICOS, CTLA4, IL2
42129 6.36E-07 1.37E-04 6 67 82 17787 regulation of T cell proliferation IL2RA, ICOS, CTLA4, IL21, IL2, CD28
51249 6.63E-07 1.37E-04 8 158 82 17787 regulation of lymphocyte activation IL2RA, ICOS, CD2, CTLA4, CD40, IL21, IL2, CD28
50863 1.49E-06 2.70E-04 7 122 82 17787 regulation of T cell activation IL2RA, ICOS, CD2, CTLA4, IL21, IL2, CD28
2694 1.77E-06 2.85E-04 8 180 82 17787 regulation of leukocyte activation IL2RA, ICOS, CD2, CTLA4, CD40, IL21, IL2, CD28
50865 2.87E-06 3.94E-04 8 192 82 17787 regulation of cell activation IL2RA, ICOS, CD2, CTLA4, CD40, IL21, IL2, CD28
45580 2.99E-06 3.94E-04 5 49 82 17787 regulation of T cell differentiation IL2RA, ICOS, CD2, CTLA4, IL2
2683 4.15E-06 5.00E-04 6 92 82 17787 negative regulation of immune system process IL2RA, ICOS, C5, CTLA4, PTPN22, IL2
50671 1.13E-05 1.10E-03 5 64 82 17787 positive regulation of lymphocyte proliferation IL2RA, CD40, IL21, IL2, CD28
32946 1.22E-05 1.10E-03 5 65 82 17787 positive regulation of mononuclear cell IL2RA, CD40, IL21, IL2, CD28
proliferation
70665 1.32E-05 1.10E-03 5 66 82 17787 positive regulation of leukocyte proliferation IL2RA, CD40, IL21, IL2, CD28
132

50672 1.37E-05 1.10E-03 4 32 82 17787 negative regulation of lymphocyte proliferation IL2RA, ICOS, CTLA4, IL2
32945 1.37E-05 1.10E-03 4 32 82 17787 negative regulation of mononuclear cell IL2RA, ICOS, CTLA4, IL2
proliferation
70664 1.37E-05 1.10E-03 4 32 82 17787 negative regulation of leukocyte proliferation IL2RA, ICOS, CTLA4, IL2
2696 1.58E-05 1.20E-03 6 116 82 17787 positive regulation of leukocyte activation IL2RA, CD2, CD40, IL21, IL2, CD28
50867 2.01E-05 1.38E-03 6 121 82 17787 positive regulation of cell activation IL2RA, CD2, CD40, IL21, IL2, CD28
45590 2.10E-05 1.38E-03 2 2 82 17787 negative regulation of regulatory T cell ICOS, CTLA4
differentiation
46013 2.10E-05 1.38E-03 2 2 82 17787 regulation of T cell homeostatic proliferation IL2RA, IL2
2682 2.52E-05 1.58E-03 10 425 82 17787 regulation of immune system process IL2RA, ICOS, C5, CD2, CTLA4, PTPN22, CD40, IL21, IL2, CD28
42102 4.11E-05 2.44E-03 4 42 82 17787 positive regulation of T cell proliferation IL2RA, IL21, IL2, CD28
9897 4.22E-05 2.44E-03 6 138 82 17787 external side of plasma membrane IL2RA, ICOS, CD2, CTLA4, CD40, CD28
5134 6.28E-05 3.49E-03 2 3 82 17787 interleukin-2 receptor binding IL21, IL2
5515 7.23E-05 3.87E-03 55 8121 82 17787 protein binding XPO1, FAM3D, PTPN22, GLI1, OS9, GSN, SPRED2, AGAP2, LONRF2, ARHGAP9, MAGI3, KIF5A,
CEP110, CD40, IL21, FLNB, DDIT3, DCTN2, CCR6, HIPK1, INHBE, FGFR1OP, RAB14, TNFAIP3,
TRAF1, PAM, PFKFB3, BLK, C5, PXK, RPP14, STAT4, FAM107A, REL, CCL21, ICOS, CD2, PEX13,
TNPO3, MARS, CD28, IL2RA, DTX3, SLC12A5, CTLA4, RSBN1, KCTD6, PHF19, RBPJ, AHSA2, AP4B1,
KIAA1045, PIP4K2C, IL2, RBM17
23052 8.78E-05 4.54E-03 29 3130 82 17787 signaling TRAF1, BLK, C5, PTPN22, GLI1, STAT4, TAGAP, CCL21, GSN, CD2, SPRED2, AGAP2, CD28,
ARHGAP9, MAGI3, IL2RA, KIF5A, DTX3, SLC12A5, CD40, IL21, FLNB, DDIT3, CCR6, HIPK1, RAB14,
RBPJ, GPR31, IL2
51250 9.60E-05 4.79E-03 4 52 82 17787 negative regulation of lymphocyte activation IL2RA, ICOS, CTLA4, IL2
50776 1.08E-04 5.22E-03 7 236 82 17787 regulation of immune response ICOS, C5, CTLA4, CD40, IL21, IL2, CD28
51251 1.24E-04 5.78E-03 5 105 82 17787 positive regulation of lymphocyte activation IL2RA, CD40, IL21, IL2, CD28
2695 1.38E-04 6.22E-03 4 57 82 17787 negative regulation of leukocyte activation IL2RA, ICOS, CTLA4, IL2
50866 2.03E-04 8.86E-03 4 63 82 17787 negative regulation of cell activation IL2RA, ICOS, CTLA4, IL2
48304 2.08E-04 8.86E-03 2 5 82 17787 positive regulation of isotype switching to IgG CD40, IL2
isotypes
2684 2.21E-04 9.13E-03 7 265 82 17787 positive regulation of immune system process IL2RA, C5, CD2, CD40, IL21, IL2, CD28
42130 2.55E-04 1.01E-02 3 27 82 17787 negative regulation of T cell proliferation IL2RA, ICOS, CTLA4
8284 2.59E-04 1.01E-02 9 459 82 17787 positive regulation of cell proliferation IL2RA, HIPK1, FGFR1OP, CD40, IL21, RBPJ, IL2, CD28, GLI1
48585 3.01E-04 1.15E-02 5 127 82 17787 negative regulation of response to stimulus IL2RA, ICOS, C5, CTLA4, IL2
45621 3.87E-04 1.43E-02 3 31 82 17787 positive regulation of lymphocyte differentiation IL2RA, IL21, IL2
48302 4.34E-04 1.55E-02 2 7 82 17787 regulation of isotype switching to IgG isotypes CD40, IL2
50870 4.40E-04 1.55E-02 4 77 82 17787 positive regulation of T cell activation IL2RA, IL21, IL2, CD28
45830 5.77E-04 1.99E-02 2 8 82 17787 positive regulation of isotype switching CD40, IL2
45911 7.40E-04 2.38E-02 2 9 82 17787 positive regulation of DNA recombination CD40, IL2
51023 7.40E-04 2.38E-02 2 9 82 17787 regulation of immunoglobulin secretion CD40, IL2
45334 7.40E-04 2.38E-02 2 9 82 17787 clathrin-coated endocytic vesicle ICOS, CTLA4
5813 8.17E-04 2.54E-02 5 158 82 17787 centrosome HIPK1, FGFR1OP, CEP110, TNFAIP3, DCTN2
2700 8.25E-04 2.54E-02 3 40 82 17787 regulation of production of molecular mediator of CD40, IL21, IL2
133

immune response
42104 9.22E-04 2.78E-02 2 10 82 17787 positive regulation of activated T cell proliferation IL2RA, IL2
50868 1.02E-03 3.01E-02 3 43 82 17787 negative regulation of T cell activation IL2RA, ICOS, CTLA4
32880 1.19E-03 3.46E-02 5 172 82 17787 regulation of protein localization PAM, C5, RAB14, CD40, IL2
45191 1.34E-03 3.82E-02 2 12 82 17787 regulation of isotype switching CD40, IL2
2697 1.41E-03 3.92E-02 4 105 82 17787 regulation of immune effector process CD40, IL21, IL2, CD28
2706 1.58E-03 4.17E-02 3 50 82 17787 regulation of lymphocyte mediated immunity CD40, IL21, IL2
45581 1.58E-03 4.17E-02 2 13 82 17787 negative regulation of T cell differentiation ICOS, CTLA4
46006 1.58E-03 4.17E-02 2 13 82 17787 regulation of activated T cell proliferation IL2RA, IL2
5126 1.65E-03 4.26E-02 5 185 82 17787 cytokine receptor binding CCL21, C5, SPRED2, IL21, IL2
42108 1.68E-03 4.26E-02 3 51 82 17787 positive regulation of cytokine biosynthetic REL, IL21, CD28
process
42127 1.74E-03 4.33E-02 11 849 82 17787 regulation of cell proliferation IL2RA, HIPK1, ICOS, FGFR1OP, CTLA4, CD40, IL21, RBPJ, IL2, CD28, GLI1
Supplementary Table 5 Significant BiNGO results for the expanded gene list. Each row shows the significant GO ID, p-value, p-value after correction for multiple testing, the number of
genes observed and in total for the gene list and all human genes, GO term description and genes identified.
GO-ID p-value corr x n X N Description Genes in test set
p-value
2504 3.73E-14 5.92E-11 9 17 192 17779 antigen processing and presentation of HLA-DQB1, HLA-DRB1, HLA-DRB5, HLA-DMB, TRAF6, HLA-DOB, HLA-DQA2, HLA-DQA1, HLA-DRA
peptide or polysaccharide antigen via MHC
class II
19882 4.73E-14 5.92E-11 13 59 192 17779 antigen processing and presentation HLA-DQB1, ICAM1, HLA-DRB1, HLA-DMB, HLA-DQA2, HLA-DQA1, PSMB9, TAP2, TAP1, HLA-
DRB5, TRAF6, HLA-DOB, HLA-DRA
42613 1.96E-13 1.64E-10 8 13 192 17779 MHC class II protein complex HLA-DQB1, HLA-DRB1, HLA-DRB5, HLA-DMB, HLA-DOB, HLA-DQA2, HLA-DQA1, HLA-DRA
2682 5.60E-12 3.50E-09 25 425 192 17779 regulation of immune system process ERBB2, C5, CD247, PTPN22, ICOS, TAP1, CD2, UBASH3A, TRAF6, CD28, ICAM1, PTPRC, IL2RA,
IL27, CTLA4, CDK6, IL6R, CD40, IL21, LAT, PRKCQ, CD19, CHRNB2, TRAFD1, IL2
50670 2.53E-11 1.19E-08 13 94 192 17779 regulation of lymphocyte proliferation PTPRC, IL2RA, ERBB2, IL27, CTLA4, CD40, IL21, PRKCQ, ICOS, CHRNB2, TRAF6, IL2, CD28
32944 2.91E-11 1.19E-08 13 95 192 17779 regulation of mononuclear cell proliferation PTPRC, IL2RA, ERBB2, IL27, CTLA4, CD40, IL21, PRKCQ, ICOS, CHRNB2, TRAF6, IL2, CD28
70663 3.33E-11 1.19E-08 13 96 192 17779 regulation of leukocyte proliferation PTPRC, IL2RA, ERBB2, IL27, CTLA4, CD40, IL21, PRKCQ, ICOS, CHRNB2, TRAF6, IL2, CD28
32395 1.20E-10 3.40E-08 6 9 192 17779 MHC class II receptor activity HLA-DQB1, HLA-DRB1, HLA-DOB, HLA-DQA2, HLA-DQA1, HLA-DRA
42129 1.33E-10 3.40E-08 11 67 192 17779 regulation of T cell proliferation PTPRC, PRKCQ, IL2RA, ICOS, IL27, ERBB2, CTLA4, TRAF6, IL21, CD28, IL2
2376 1.36E-10 3.40E-08 35 947 192 17779 immune system process HLA-DQB1, HLA-DRB1, CD247, C5, RAG1, RAG2, HLA-DMB, CCL21, TAP2, ICOS, TAP1, CD2, HLA-
DRB5, SH2B3, TRAF6, HLA-DOB, CD28, PTPRC, ICAM1, IL2RA, IL27, CTLA4, IL6R, CD40, IL21, HLA-
DQA2, HLA-DQA1, PSMB9, LAT, CD19, CCR6, NOTCH4, CHRNB2, IL2, HLA-DRA
134

51249 1.75E-10 3.97E-08 15 158 192 17779 regulation of lymphocyte activation PTPRC, IL2RA, ERBB2, IL27, CTLA4, CD40, IL21, LAT, PRKCQ, ICOS, CD2, CHRNB2, TRAF6, IL2,
CD28
50863 7.10E-10 1.48E-07 13 122 192 17779 regulation of T cell activation PTPRC, IL2RA, ERBB2, IL27, CTLA4, IL21, LAT, PRKCQ, ICOS, CD2, TRAF6, IL2, CD28
2694 1.09E-09 2.10E-07 15 180 192 17779 regulation of leukocyte activation PTPRC, IL2RA, ERBB2, IL27, CTLA4, CD40, IL21, LAT, PRKCQ, ICOS, CD2, CHRNB2, TRAF6, IL2,
CD28
50865 2.67E-09 4.77E-07 15 192 192 17779 regulation of cell activation PTPRC, IL2RA, ERBB2, IL27, CTLA4, CD40, IL21, LAT, PRKCQ, ICOS, CD2, CHRNB2, TRAF6, IL2,
CD28
42611 2.93E-09 4.89E-07 8 35 192 17779 MHC protein complex HLA-DQB1, HLA-DRB1, HLA-DRB5, HLA-DMB, HLA-DOB, HLA-DQA2, HLA-DQA1, HLA-DRA
2683 4.39E-09 6.86E-07 11 92 192 17779 negative regulation of immune system PTPRC, IL2RA, ICOS, ERBB2, TAP1, C5, CTLA4, PTPN22, UBASH3A, TRAFD1, IL2
process
6955 1.37E-08 2.01E-06 25 618 192 17779 immune response HLA-DQB1, HLA-DRB1, C5, RAG1, RAG2, HLA-DMB, CCL21, ICOS, HLA-DRB5, TRAF6, HLA-DOB,
CD28, ICAM1, PTPRC, IL2RA, IL27, CTLA4, IL6R, IL21, HLA-DQA2, HLA-DQA1, LAT, CCR6, IL2, HLA-
DRA
50671 2.73E-08 3.80E-06 9 64 192 17779 positive regulation of lymphocyte PTPRC, PRKCQ, IL2RA, CHRNB2, CD40, TRAF6, IL21, CD28, IL2
proliferation
32946 3.14E-08 4.14E-06 9 65 192 17779 positive regulation of mononuclear cell PTPRC, PRKCQ, IL2RA, CHRNB2, CD40, TRAF6, IL21, CD28, IL2
proliferation
70665 3.61E-08 4.51E-06 9 66 192 17779 positive regulation of leukocyte proliferation PTPRC, PRKCQ, IL2RA, CHRNB2, CD40, TRAF6, IL21, CD28, IL2
50776 4.34E-08 5.17E-06 15 236 192 17779 regulation of immune response PTPRC, ICAM1, IL27, C5, CD247, CTLA4, CD40, IL21, CD19, ICOS, TAP1, TRAFD1, TRAF6, IL2, CD28
45619 1.68E-07 1.91E-05 8 57 192 17779 regulation of lymphocyte differentiation IL2RA, ICOS, IL27, ERBB2, CD2, CTLA4, IL21, IL2
2684 1.98E-07 2.15E-05 15 265 192 17779 positive regulation of immune system PTPRC, ICAM1, IL2RA, C5, CD247, IL6R, CD40, IL21, PRKCQ, CD19, CD2, CHRNB2, TRAF6, IL2,
process CD28
42102 3.01E-07 3.14E-05 7 42 192 17779 positive regulation of T cell proliferation PTPRC, PRKCQ, IL2RA, TRAF6, IL21, CD28, IL2
5765 5.00E-07 4.97E-05 9 89 192 17779 lysosomal membrane HLA-DQB1, CLN3, HLA-DRB1, HLA-DRB5, HLA-DMB, HLA-DOB, HLA-DQA2, HLA-DQA1, HLA-DRA
2696 5.17E-07 4.97E-05 10 116 192 17779 positive regulation of leukocyte activation PTPRC, PRKCQ, IL2RA, CD2, CHRNB2, CD40, TRAF6, IL21, CD28, IL2
50867 7.65E-07 7.08E-05 10 121 192 17779 positive regulation of cell activation PTPRC, PRKCQ, IL2RA, CD2, CHRNB2, CD40, TRAF6, IL21, CD28, IL2
45580 8.99E-07 8.03E-05 7 49 192 17779 regulation of T cell differentiation IL2RA, ICOS, IL27, ERBB2, CD2, CTLA4, IL2
48583 1.09E-06 9.41E-05 20 525 192 17779 regulation of response to stimulus PTPRC, ICAM1, CLN3, IL2RA, IL27, CD247, C5, CTLA4, IL6R, CD40, IL21, TMPRSS6, CD19, ICOS,
TAP1, CHRNB2, TRAFD1, TRAF6, CD28, IL2
51251 2.04E-06 1.64E-04 9 105 192 17779 positive regulation of lymphocyte activation PTPRC, PRKCQ, IL2RA, CHRNB2, CD40, TRAF6, IL21, CD28, IL2
2697 2.04E-06 1.64E-04 9 105 192 17779 regulation of immune effector process PTPRC, ICAM1, IL27, TAP1, CD40, TRAF6, IL21, CD28, IL2
9897 2.55E-06 1.96E-04 10 138 192 17779 external side of plasma membrane ICAM1, IL2RB, IL2RA, CD19, ICOS, CD2, CTLA4, CHRNB2, CD40, CD28
8284 2.58E-06 1.96E-04 18 459 192 17779 positive regulation of cell proliferation PTPRC, IL2RA, ERBB2, IL27, CDK6, IL6R, CD40, IL21, GLI1, S1PR2, PRKCQ, HIPK1, FGFR1OP,
CHRNB2, TRAF6, RBPJ, IL2, CD28
45589 4.92E-06 3.58E-04 3 4 192 17779 regulation of regulatory T cell differentiation ICOS, CTLA4, IL2
5774 5.01E-06 3.58E-04 9 117 192 17779 vacuolar membrane HLA-DQB1, CLN3, HLA-DRB1, HLA-DRB5, HLA-DMB, HLA-DOB, HLA-DQA2, HLA-DQA1, HLA-DRA
44437 8.07E-06 5.61E-04 9 124 192 17779 vacuolar part HLA-DQB1, CLN3, HLA-DRB1, HLA-DRB5, HLA-DMB, HLA-DOB, HLA-DQA2, HLA-DQA1, HLA-DRA
48585 9.81E-06 6.63E-04 9 127 192 17779 negative regulation of response to stimulus PTPRC, CLN3, IL2RA, ICOS, TAP1, C5, CTLA4, TRAFD1, IL2
50777 1.34E-05 8.83E-04 5 29 192 17779 negative regulation of immune response PTPRC, ICOS, TAP1, CTLA4, TRAFD1
2706 1.57E-05 1.01E-03 6 50 192 17779 regulation of lymphocyte mediated immunity PTPRC, TAP1, CD40, TRAF6, IL21, IL2
42108 1.76E-05 1.10E-03 6 51 192 17779 positive regulation of cytokine biosynthetic PRKCQ, REL, IL27, TRAF6, IL21, CD28
135

process
50870 1.95E-05 1.19E-03 7 77 192 17779 positive regulation of T cell activation PTPRC, PRKCQ, IL2RA, TRAF6, IL21, CD28, IL2
70664 2.22E-05 1.26E-03 5 32 192 17779 negative regulation of leukocyte proliferation IL2RA, ICOS, ERBB2, CTLA4, IL2
50672 2.22E-05 1.26E-03 5 32 192 17779 negative regulation of lymphocyte IL2RA, ICOS, ERBB2, CTLA4, IL2
proliferation
32945 2.22E-05 1.26E-03 5 32 192 17779 negative regulation of mononuclear cell IL2RA, ICOS, ERBB2, CTLA4, IL2
proliferation
51239 3.25E-05 1.81E-03 27 1068 192 17779 regulation of multicellular organismal process ERBB2, C5, KEAP1, PXK, GLI1, REL, ICOS, S1PR5, CD2, UBASH3A, TRAF6, CD28, IL2RA, IL27,
CTLA4, CDK6, IL6R, CD40, IL21, TMPRSS6, PTPN11, ATXN2, PRKCQ, NOTCH4, ATP2A1, CHRNB2,
IL2
2703 4.11E-05 2.23E-03 6 59 192 17779 regulation of leukocyte mediated immunity PTPRC, TAP1, CD40, TRAF6, IL21, IL2
45321 4.73E-05 2.52E-03 11 233 192 17779 leukocyte activation PTPRC, ICAM1, LAT, CD2, RAG1, CHRNB2, CD40, RAG2, TRAF6, CD28, IL2
50789 7.01E-05 3.61E-03 97 6552 192 17779 regulation of biological process PTPN22, PDHB, S1PR2, BATF, PPP1R1B, PDE4A, S1PR5, PHTF1, SPRED2, IL27, CD40, IL21, DDIT3,
BRAP, CCR6, FGFR1OP, ZGLP1, RAB14, TNFAIP3, ZNF438, C17ORF37, TRAF1, PAM, HLA-DRB1,
GSDMA, MAPKAPK5, ERBB2, BLK, KEAP1, PXK, CDC37, TAGAP, RPL6, ICOS, UBASH3A, TRAF6,
IKZF3, UBE2L3, PSMB8, PTPN11, PSMB9, LAT, NOTCH4, ATP2A1, CHRNB2, TRAFD1, RBPJ, GRB7,
ADAR, PNMT, FAM3D, GLI1, NFATC2IP, GSN, CCDC101, AGAP2, ARHGAP9, CLN3, ICAM1,
CEP110, CDK6, IL6R, FLNB, TYK2, PRKCQ, HIPK1, NCOA5, CUX2, C5, CD247, RAG1, PI4KAP2,
RAG2, HIC2, STAT4, FAM107A, REL, CCL21, TAP1, CD2, POU2F1, CD28, PTPRC, IL2RB, IL2RA,
CTLA4, AFF3, UBE2Q1, TMPRSS6, ATXN2, PHF19, CD19, NUPR1, IRF5, NEUROD2, KIAA1045, IL2
1817 7.07E-05 3.61E-03 10 202 192 17779 regulation of cytokine production PRKCQ, REL, IL27, C5, UBASH3A, CD40, IL6R, TRAF6, IL21, CD28
10008 7.37E-05 3.61E-03 10 203 192 17779 endosome membrane STARD3, HLA-DQB1, HLA-DRB1, HLA-DRB5, HLA-DMB, TRAF6, HLA-DOB, HLA-DQA2, HLA-DQA1,
HLA-DRA
44440 7.37E-05 3.61E-03 10 203 192 17779 endosomal part STARD3, HLA-DQB1, HLA-DRB1, HLA-DRB5, HLA-DMB, TRAF6, HLA-DOB, HLA-DQA2, HLA-DQA1,
HLA-DRA
65007 8.81E-05 4.19E-03 101 6941 192 17779 biological regulation PTPN22, PDHB, BATF, S1PR2, PPP1R1B, PDE4A, S1PR5, PHTF1, SPRED2, IL27, CD40, IL21, DDIT3,
BRAP, CCR6, FGFR1OP, ZGLP1, RAB14, TNFAIP3, ZNF438, C17ORF37, TRAF1, PAM, HLA-DRB1,
GSDMA, MAPKAPK5, ERBB2, BLK, KEAP1, PXK, CDC37, TAGAP, RPL6, ICOS, UBASH3A, TRAF6,
IKZF3, UBE2L3, PSMB8, PTPN11, PSMB9, LAT, NOTCH4, ATP2A1, CHRNB2, TRAFD1, RBPJ, GRB7,
ADAR, PNMT, FAM3D, GLI1, OS9, NFATC2IP, GSN, CCDC101, AGAP2, ARHGAP9, CLN3, ICAM1,
CEP110, CDK6, IL6R, FLNB, TYK2, PRKCQ, HIPK1, NCOA5, CUX2, C5, CD247, RAG1, PI4KAP2,
RAG2, HIC2, STAT4, FAM107A, REL, CCL21, TAP2, TAP1, CD2, POU2F1, CD28, PTPRC, IL2RB,
IL2RA, SLC12A5, CTLA4, AFF3, UBE2Q1, TMPRSS6, TMPRSS3, ATXN2, PHF19, CD19, IRF5, NUPR1,
NEUROD2, KIAA1045, IL2
9986 8.88E-05 4.19E-03 13 341 192 17779 cell surface ICAM1, PTPRC, IL2RB, IL2RA, CTLA4, IL6R, CD40, CD19, ICOS, NOTCH4, CD2, CHRNB2, CD28
51023 9.93E-05 4.60E-03 3 9 192 17779 regulation of immunoglobulin secretion CD40, TRAF6, IL2
23052 1.01E-04 4.61E-03 55 3130 192 17779 signaling PTPN22, GLI1, S1PR2, PPP1R1B, GSN, PDE4A, SPRED2, AGAP2, ARHGAP9, CLN3, MAGI3, KIF5A,
IL6R, CD40, IL21, FLNB, DDIT3, BRAP, TYK2, PRKCQ, CCR6, HIPK1, RAB14, TRAF1, HLA-DRB1,
MAPKAPK5, ERBB2, BLK, CD247, C5, PI4KAP2, STAT4, TAGAP, CCL21, POU2F1, CD2, SH2B3,
TRAF6, CD28, PTPRC, IL2RB, IL2RA, PTPN2, DTX3, SLC12A5, TMPRSS6, PTPN11, LAT, CD19,
NOTCH4, CHRNB2, RBPJ, GRB7, GPR31, IL2
136

4911 1.16E-04 4.68E-03 2 2 192 17779 interleukin-2 receptor activity IL2RB, IL2RA
9439 1.16E-04 4.68E-03 2 2 192 17779 cyanate metabolic process TST, MPST
9440 1.16E-04 4.68E-03 2 2 192 17779 cyanate catabolic process TST, MPST
45590 1.16E-04 4.68E-03 2 2 192 17779 negative regulation of regulatory T cell ICOS, CTLA4
differentiation
46013 1.16E-04 4.68E-03 2 2 192 17779 regulation of T cell homeostatic proliferation IL2RA, IL2
42825 1.16E-04 4.68E-03 2 2 192 17779 TAP complex TAP2, TAP1
46967 1.16E-04 4.68E-03 2 2 192 17779 cytosol to ER transport TAP2, TAP1
1910 1.18E-04 4.69E-03 4 24 192 17779 regulation of leukocyte mediated cytotoxicity PTPRC, ICAM1, TAP1, IL21
2768 1.20E-04 4.69E-03 5 45 192 17779 immune response-regulating cell surface PTPRC, CD19, CD247, CD40, TRAF6
receptor signaling pathway
30890 1.40E-04 5.37E-03 4 25 192 17779 positive regulation of B cell proliferation PTPRC, CHRNB2, CD40, IL2
44459 1.73E-04 6.56E-03 39 1999 192 17779 plasma membrane part HLA-DQB1, HLA-DRB1, PNMT, ERBB2, CD247, C5, KEAP1, HLA-DMB, PDE4A, ICOS, CD2, HLA-
DRB5, HLA-DOB, CD28, PTPRC, ICAM1, CLN3, IL2RB, MAGI3, IL2RA, ICAM5, ICAM3, SLC12A5,
CTLA4, IL6R, CD40, HLA-DQA2, HLA-DQA1, KCTD6, PRKCQ, LAT, CD19, CCR6, NOTCH4, KCTD17,
RAB14, CHRNB2, GPR31, HLA-DRA
42035 1.84E-04 6.87E-03 6 77 192 17779 regulation of cytokine biosynthetic process PRKCQ, REL, IL27, TRAF6, IL21, CD28
42130 1.90E-04 6.92E-03 4 27 192 17779 negative regulation of T cell proliferation IL2RA, ICOS, ERBB2, CTLA4
1772 1.92E-04 6.92E-03 3 11 192 17779 immunological synapse PRKCQ, ICAM1, LAT
46649 1.94E-04 6.92E-03 9 186 192 17779 lymphocyte activation PTPRC, ICAM1, CD2, RAG1, CHRNB2, CD40, RAG2, CD28, IL2
2822 1.99E-04 7.02E-03 5 50 192 17779 regulation of adaptive immune response PTPRC, IL27, CD40, TRAF6, IL2
based on somatic recombination of immune
receptors built from immunoglobulin
superfamily domains
1775 2.06E-04 7.14E-03 11 275 192 17779 cell activation PTPRC, ICAM1, LAT, CD2, RAG1, CHRNB2, CD40, RAG2, TRAF6, CD28, IL2
2819 2.19E-04 7.45E-03 5 51 192 17779 regulation of adaptive immune response PTPRC, IL27, CD40, TRAF6, IL2
31341 2.20E-04 7.45E-03 4 28 192 17779 regulation of cell killing PTPRC, ICAM1, TAP1, IL21
51250 2.40E-04 7.93E-03 5 52 192 17779 negative regulation of lymphocyte activation IL2RA, ICOS, ERBB2, CTLA4, IL2
323 2.44E-04 7.93E-03 10 235 192 17779 lytic vacuole HLA-DQB1, CLN3, HLA-DRB1, RAB14, HLA-DRB5, HLA-DMB, HLA-DOB, HLA-DQA2, HLA-DQA1,
HLA-DRA
5764 2.44E-04 7.93E-03 10 235 192 17779 lysosome HLA-DQB1, CLN3, HLA-DRB1, RAB14, HLA-DRB5, HLA-DMB, HLA-DOB, HLA-DQA2, HLA-DQA1,
HLA-DRA
42100 2.54E-04 8.14E-03 3 12 192 17779 B cell proliferation PTPRC, CD40, RAG2
5768 2.65E-04 8.38E-03 13 381 192 17779 endosome HLA-DQB1, CLN3, HLA-DRB1, HLA-DMB, HLA-DQA2, HLA-DQA1, STARD3, RABEP2, RAB14, HLA-
DRB5, TRAF6, HLA-DOB, HLA-DRA
42110 3.11E-04 9.71E-03 7 119 192 17779 T cell activation PTPRC, ICAM1, CD2, RAG1, RAG2, CD28, IL2
42127 3.44E-04 9.82E-03 21 848 192 17779 regulation of cell proliferation PTPRC, IL2RA, PNMT, ERBB2, IL27, CTLA4, CDK6, IL6R, CD40, IL21, GLI1, S1PR2, PRKCQ, HIPK1,
FGFR1OP, ICOS, CHRNB2, RBPJ, TRAF6, CD28, IL2
47894 3.46E-04 9.82E-03 2 3 192 17779 flavonol 3-sulfotransferase activity SULT1A1, SULT1A2
4792 3.46E-04 9.82E-03 2 3 192 17779 thiosulfate sulfurtransferase activity TST, MPST
5134 3.46E-04 9.82E-03 2 3 192 17779 interleukin-2 receptor binding IL21, IL2
137

2331 3.46E-04 9.82E-03 2 3 192 17779 pre-B cell allelic exclusion RAG1, RAG2
19976 3.46E-04 9.82E-03 2 3 192 17779 interleukin-2 binding IL2RB, IL2RA
46977 3.46E-04 9.82E-03 2 3 192 17779 TAP binding TAP2, TAP1
46978 3.46E-04 9.82E-03 2 3 192 17779 TAP1 binding TAP2, TAP1
2695 3.70E-04 1.04E-02 5 57 192 17779 negative regulation of leukocyte activation IL2RA, ICOS, ERBB2, CTLA4, IL2
30888 3.74E-04 1.04E-02 4 32 192 17779 regulation of B cell proliferation PTPRC, CHRNB2, CD40, IL2
50854 4.13E-04 1.14E-02 3 14 192 17779 regulation of antigen receptor-mediated PTPRC, PTPN22, UBASH3A
signaling pathway
8150 4.23E-04 1.15E-02 172 14296 192 17779 biological_process XPO1, GGT2, PTPN22, SLC26A10, DNASE1L3, PDHB, STARD3, BATF, S1PR2, EIF3CL, OLFML3,
PPP1R1B, PDE4A, SULT1A1, S1PR5, SPRED2, PHTF1, SULT1A2, LONRF2, MAGI3, BCL2L15, KIF5A,
IL27, ERP29, CD40, IL21, HLA-DQA2, BRAP, HLA-DQA1, DDIT3, DCTN2, CCR6, DCLRE1B, FGFR1OP,
ZGLP1, RAB14, FDX1L, TNFAIP3, ZNF438, HLA-DRA, MPST, TRAF1, C17ORF37, PAM, GSDMA,
HLA-DRB1, PFKFB3, BLK, ERBB2, MAPKAPK5, PPIL2, KEAP1, PXK, CDC37, TAGAP, SBK1, RPL6,
ICOS, HLA-DRB5, UBASH3A, TRAF6, HLA-DOB, IKZF3, PTPN2, UBE2L3, PSMB8, KCTD6, PSMB9,
PTPN11, TST, TMEM116, LAT, RABEP2, ATP2A1, NOTCH4, ALDH2, KCTD17, CHRNB2, TRAFD1,
AHSA2, RBPJ, SPNS1, GRB7, ACAD10, PIP4K2C, GPR31, ADAR, ACOX2, FAM3D, PNMT, PPIP5K2,
HLA-DMB, OS9, GLI1, NFATC2IP, GSN, CCDC101, AGAP2, ARHGAP9, PGAP3, ICAM1, CLN3,
MRPL4, ICAM4, ICAM5, CEP110, ICAM3, CDK6, IL6R, FLNB, TYK2, PRKCQ, HIPK1, ZPBP2, INHBE,
NCOA5, CUX2, ORMDL3, TUFM, HLA-DQB1, TCAP, CD247, ADAD1, C5, PUS10, PI4KAP2, RAG1,
C12ORF51, RAG2, RPP14, PLCL2, GIN1, HIC2, STAT4, FAM107A, ATXN2L, REL, CCL21, TAP2,
RNASET2, TAP1, BCRP2, POU2F1, CD2, SH2B3, PEX13, SH2B1, USP34, TNPO3, CD28, MARS,
B4GALNT1, PTPRC, IL2RB, IL2RA, DTX3, SLC12A5, CTLA4, AFF3, TMPRSS6, UBE2Q1, TMPRSS3,
ATXN2, CD19, PHF19, IRF5, NUPR1, NEUROD2, AP4B1, KIAA1045, RBM17, IL2
48519 4.45E-04 1.19E-02 38 2021 192 17779 negative regulation of biological process FAM3D, PNMT, ERBB2, C5, RAG1, PTPN22, PXK, HIC2, GSN, ICOS, TAP1, POU2F1, UBASH3A,
TRAF6, PTPRC, CLN3, IL2RB, IL2RA, CTLA4, CDK6, IL6R, TMPRSS6, PSMB8, BRAP, DDIT3, PSMB9,
PTPN11, ATXN2, ATP2A1, ZGLP1, NOTCH4, CUX2, TRAFD1, RBPJ, TNFAIP3, KIAA1045, ADAR, IL2
42175 4.45E-04 1.19E-02 17 624 192 17779 nuclear membrane-endoplasmic reticulum PGAP3, HLA-DQB1, CLN3, XPO1, HLA-DRB1, PNMT, HLA-DQA2, HLA-DQA1, TMPRSS3, OS9, TAP2,
network TAP1, ATP2A1, RAB14, HLA-DRB5, ORMDL3, HLA-DRA
2764 0.00047 0.012297 5 60 192 17779 immune response-regulating signaling PTPRC, CD19, CD247, CD40, TRAF6
pathway
23033 0.00047 0.012297 39 2100 192 17779 signaling pathway BLK, ERBB2, CD247, C5, PI4KAP2, GLI1, S1PR2, STAT4, GSN, SPRED2, CD2, SH2B3, TRAF6, AGAP2,
CD28, PTPRC, CLN3, IL2RB, MAGI3, IL2RA, PTPN2, DTX3, IL6R, CD40, TMPRSS6, DDIT3, BRAP,
PTPN11, TYK2, PRKCQ, LAT, CD19, HIPK1, NOTCH4, RAB14, RBPJ, GRB7, GPR31, IL2
50851 0.00053 0.013695 4 35 192 17779 antigen receptor-mediated signaling pathway PTPRC, CD19, CD247, TRAF6
30217 0.00055 0.013983 5 62 192 17779 T cell differentiation PTPRC, RAG1, RAG2, CD28, IL2
32880 0.00057 0.014411 8 172 192 17779 regulation of protein localization PRKCQ, PAM, C5, RAB14, CD40, TRAF6, PTPN11, IL2
50866 0.00059 0.014756 5 63 192 17779 negative regulation of cell activation IL2RA, ICOS, ERBB2, CTLA4, IL2
50871 0.00066 0.015603 4 37 192 17779 positive regulation of B cell activation PTPRC, CHRNB2, CD40, IL2
50708 0.00068 0.015603 5 65 192 17779 regulation of protein secretion PRKCQ, C5, CD40, TRAF6, IL2
30581 0.00069 0.015603 2 4 192 17779 symbiont intracellular protein transport in TAP2, TAP1
host
138

51708 0.00069 0.015603 2 4 192 17779 intracellular protein transport in other TAP2, TAP1
organism involved in symbiotic interaction
1911 0.00069 0.015603 2 4 192 17779 negative regulation of leukocyte mediated PTPRC, TAP1
cytotoxicity
2327 0.00069 0.015603 2 4 192 17779 immature B cell differentiation RAG1, RAG2
2329 0.00069 0.015603 2 4 192 17779 pre-B cell differentiation RAG1, RAG2
42824 0.00069 0.015603 2 4 192 17779 MHC class I peptide loading complex TAP2, TAP1
46719 0.00069 0.015603 2 4 192 17779 regulation of viral protein levels in host cell TAP2, TAP1
19060 0.00069 0.015603 2 4 192 17779 intracellular transport of viral proteins in host TAP2, TAP1
cell
50794 0.00072 0.016254 89 6223 192 17779 regulation of cellular process FAM3D, PNMT, PTPN22, PDHB, GLI1, NFATC2IP, BATF, S1PR2, GSN, PPP1R1B, PDE4A, CCDC101,
S1PR5, SPRED2, PHTF1, AGAP2, ARHGAP9, ICAM1, CLN3, CEP110, IL27, CDK6, IL6R, CD40, IL21,
FLNB, DDIT3, BRAP, TYK2, PRKCQ, CCR6, HIPK1, FGFR1OP, NCOA5, ZGLP1, RAB14, CUX2,
TNFAIP3, ZNF438, TRAF1, C17ORF37, PAM, HLA-DRB1, GSDMA, BLK, ERBB2, MAPKAPK5, C5,
PI4KAP2, RAG1, KEAP1, PXK, CDC37, HIC2, TAGAP, STAT4, FAM107A, REL, CCL21, RPL6, ICOS,
TAP1, POU2F1, CD2, UBASH3A, TRAF6, CD28, PTPRC, IL2RB, IKZF3, IL2RA, CTLA4, AFF3, UBE2L3,
PSMB8, PSMB9, PTPN11, ATXN2, LAT, PHF19, IRF5, NUPR1, NOTCH4, NEUROD2, CHRNB2, RBPJ,
GRB7, KIAA1045, IL2
71212 0.00073 0.016254 18 712 192 17779 subsynaptic reticulum PGAP3, HLA-DQB1, CLN3, HLA-DRB1, PNMT, ERP29, HLA-DQA2, HLA-DQA1, TMPRSS3, OS9,
TAP2, SDF2L1, TAP1, ATP2A1, RAB14, HLA-DRB5, ORMDL3, HLA-DRA
5515 0.00075 0.016648 110 8118 192 17779 protein binding XPO1, PTPN22, SHE, BATF, EIF3CL, PDE4A, RAVER1, SPRED2, LONRF2, MAGI3, KIF5A, IL27,
NAA25, CD40, IL21, DDIT3, BRAP, DCTN2, CCR6, FGFR1OP, RAB14, TNFAIP3, TRAF1, PAM,
PFKFB3, MAPKAPK5, ERBB2, BLK, KEAP1, PXK, CDC37, SBK1, ICOS, TRAF6, IKZF3, PTPN2, UBE2L3,
PSMB8, PTPN11, KCTD6, LAT, RABEP2, NOTCH4, ATP2A1, KCTD17, ALDH2, CHRNB2, TRAFD1,
SPNS1, RBPJ, AHSA2, GRB7, PIP4K2C, FAM3D, GLI1, OS9, GSN, AGAP2, ARHGAP9, CLN3, ICAM1,
ICAM4, ICAM5, CEP110, ICAM3, CDK6, IL6R, FLNB, TYK2, PRKCQ, HIPK1, INHBE, ORMDL3, TUFM,
TCAP, CD247, C5, RAG1, RAG2, RPP14, HIC2, STAT4, FAM107A, REL, CCL21, TAP2, TAP1, POU2F1,
CD2, SH2B3, PEX13, SH2B1, TNPO3, MARS, CD28, PTPRC, IL2RB, IL2RA, DTX3, SLC12A5, CTLA4,
UBE2Q1, RSBN1, ATXN2, PHF19, CD19, AP4B1, KIAA1045, IL2, RBM17
5773 0.00077 0.01686 10 272 192 17779 vacuole HLA-DQB1, CLN3, HLA-DRB1, RAB14, HLA-DRB5, HLA-DMB, HLA-DOB, HLA-DQA2, HLA-DQA1,
HLA-DRA
48471 0.00084 0.018211 11 325 192 17779 perinuclear region of cytoplasm ATXN2, GSDMA, GSN, KIF5A, FGFR1OP, ICOS, PDE4A, ERBB2, ATP2A1, RAB14, CTLA4
2700 0.00089 0.019165 4 40 192 17779 regulation of production of molecular CD40, TRAF6, IL21, IL2
mediator of immune response
2429 0.00098 0.020879 4 41 192 17779 immune response-activating cell surface PTPRC, CD19, CD247, TRAF6
receptor signaling pathway
45595 0.00103 0.021793 15 554 192 17779 regulation of cell differentiation IL2RA, ERBB2, IL27, CTLA4, KEAP1, CDK6, IL6R, IL21, ICOS, S1PR5, NOTCH4, CD2, CHRNB2,
TRAF6, IL2
31348 0.00107 0.0225 4 42 192 17779 negative regulation of defense response IL2RA, TAP1, TRAFD1, IL2
44432 0.0011 0.022905 17 677 192 17779 endoplasmic reticulum part PGAP3, HLA-DQB1, CLN3, HLA-DRB1, PNMT, ERP29, HLA-DQA2, HLA-DQA1, TMPRSS3, OS9,
TAP2, SDF2L1, TAP1, ATP2A1, HLA-DRB5, ORMDL3, HLA-DRA
139

16783 0.00114 0.022905 2 5 192 17779 sulfurtransferase activity TST, MPST


48304 0.00114 0.022905 2 5 192 17779 positive regulation of isotype switching to IgG CD40, IL2
isotypes
31342 0.00114 0.022905 2 5 192 17779 negative regulation of cell killing PTPRC, TAP1
4062 0.00114 0.022905 2 5 192 17779 aryl sulfotransferase activity SULT1A1, SULT1A2
5737 0.00115 0.022977 104 7647 192 17779 cytoplasm XPO1, PTPN22, PDHB, STARD3, EIF3CL, PPP1R1B, PDE4A, SULT1A1, RAVER1, PHTF1, SPRED2,
SULT1A2, BCL2L15, KIF5A, IL27, ERP29, NAA25, HLA-DQA2, DDIT3, BRAP, HLA-DQA1, DCTN2,
FGFR1OP, RAB14, FDX1L, TNFAIP3, HLA-DRA, MPST, C17ORF37, TRAF1, PAM, HLA-DRB1,
GSDMA, PFKFB3, GSDMB, MAPKAPK5, ERBB2, PPIL2, KEAP1, PXK, CDC37, SBK1, RPL6, ICOS, HLA-
DRB5, UBASH3A, TRAF6, HLA-DOB, PTPN2, UBE2L3, PSMB8, PTPN11, PSMB9, TST, LAT, RABEP2,
ATP2A1, ALDH2, SPNS1, AHSA2, PIP4K2C, ADAR, ACOX2, PNMT, PPIP5K2, HLA-DMB, GLI1, OS9,
NFATC2IP, GSN, AGAP2, PGAP3, CLN3, MRPL4, CEP110, CDK6, FLNB, TYK2, PRKCQ, HIPK1,
ORMDL3, HLA-DQB1, TUFM, TCAP, CD247, PLCL2, STAT4, TAP2, TAP1, BCRP2, PEX13, SH2B1,
TNPO3, MARS, B4GALNT1, CD28, DTX3, CTLA4, PRR5L, TMPRSS3, ATXN2, SDF2L1, AP4B1,
KIAA1045
50868 0.00117 0.023233 4 43 192 17779 negative regulation of T cell activation IL2RA, ICOS, ERBB2, CTLA4
42113 0.00123 0.024202 5 74 192 17779 B cell activation PTPRC, RAG1, CHRNB2, CD40, RAG2
7166 0.00137 0.026679 26 1280 192 17779 cell surface receptor linked signaling pathway ERBB2, C5, CD247, GLI1, S1PR2, STAT4, CD2, TRAF6, CD28, CLN3, PTPRC, IL2RB, IL2RA, PTPN2,
DTX3, CD40, IL6R, PTPN11, LAT, CD19, HIPK1, NOTCH4, RBPJ, GRB7, GPR31, IL2
32655 0.00143 0.027701 3 21 192 17779 regulation of interleukin-12 production REL, CD40, TRAF6
44267 0.00147 0.028289 38 2153 192 17779 cellular protein metabolic process TUFM, PAM, BLK, MAPKAPK5, ERBB2, C5, PPIL2, RAG1, PTPN22, C12ORF51, OS9, S1PR2, EIF3CL,
STAT4, SBK1, RPL6, BCRP2, USP34, TRAF6, MARS, PGAP3, PTPRC, CLN3, MRPL4, PTPN2, ERP29,
CDK6, UBE2L3, UBE2Q1, PSMB8, PSMB9, PTPN11, TYK2, PRKCQ, HIPK1, NCOA5, TNFAIP3,
KIAA1045
16788 0.00155 0.029603 17 699 192 17779 hydrolase activity, acting on ester bonds PGAP3, PTPRC, PTPN2, PFKFB3, ABHD6, PPIP5K2, RAG1, PTPN22, RAG2, DNASE1L3, PTPN11,
RPP14, PLCL2, RNASET2, PDE4A, USP34, TNFAIP3
30004 0.00164 0.031098 3 22 192 17779 cellular monovalent inorganic cation CLN3, SLC12A5, TMPRSS3
homeostasis
2483 0.00169 0.031567 2 6 192 17779 antigen processing and presentation of TAP2, TAP1
endogenous peptide antigen
19885 0.00169 0.031567 2 6 192 17779 antigen processing and presentation of TAP2, TAP1
endogenous peptide antigen via MHC class I
31347 0.00185 0.034288 7 161 192 17779 regulation of defense response IL2RA, IL27, TAP1, TRAFD1, IL21, CD28, IL2
2637 0.00212 0.038761 3 24 192 17779 regulation of immunoglobulin production CD40, TRAF6, IL2
50852 0.00212 0.038761 3 24 192 17779 T cell receptor signaling pathway PTPRC, CD247, TRAF6
50864 0.00222 0.04017 4 51 192 17779 regulation of B cell activation PTPRC, CHRNB2, CD40, IL2
48518 0.00227 0.040833 38 2206 192 17779 positive regulation of biological process TRAF1, PNMT, GSDMA, ERBB2, CD247, C5, RAG2, GLI1, S1PR2, REL, TAP1, POU2F1, CD2, TRAF6,
CD28, PTPRC, ICAM1, IL2RB, IL2RA, IL27, CDK6, IL6R, CD40, IL21, PSMB8, DDIT3, PSMB9,
PTPN11, PRKCQ, CD19, HIPK1, NUPR1, FGFR1OP, NOTCH4, ATP2A1, CHRNB2, RBPJ, IL2
48302 0.00235 0.040833 2 7 192 17779 regulation of isotype switching to IgG CD40, IL2
isotypes
140

45824 0.00235 0.040833 2 7 192 17779 negative regulation of innate immune TAP1, TRAFD1
response
45084 0.00235 0.040833 2 7 192 17779 positive regulation of interleukin-12 REL, TRAF6
biosynthetic process
50858 0.00235 0.040833 2 7 192 17779 negative regulation of antigen receptor- PTPN22, UBASH3A
mediated signaling pathway
50860 0.00235 0.040833 2 7 192 17779 negative regulation of T cell receptor PTPN22, UBASH3A
signaling pathway
50793 0.00238 0.041005 18 792 192 17779 regulation of developmental process IL2RA, ERBB2, IL27, C5, CTLA4, KEAP1, CDK6, IL6R, CD40, IL21, GLI1, ICOS, NOTCH4, S1PR5, CD2,
CHRNB2, TRAF6, IL2
48002 0.00239 0.041005 3 25 192 17779 antigen processing and presentation of TAP2, TAP1, TRAF6
peptide antigen
80134 0.00251 0.042767 10 319 192 17779 regulation of response to stress CLN3, IL2RA, IL27, TAP1, TRAFD1, TRAF6, IL21, TMPRSS6, CD28, IL2
2521 0.00258 0.043359 6 127 192 17779 leukocyte differentiation PTPRC, RAG1, RAG2, TRAF6, CD28, IL2
5789 0.00258 0.043359 15 609 192 17779 endoplasmic reticulum membrane PGAP3, HLA-DQB1, CLN3, HLA-DRB1, PNMT, HLA-DQA2, HLA-DQA1, TMPRSS3, OS9, TAP2, TAP1,
ATP2A1, HLA-DRB5, ORMDL3, HLA-DRA
4842 0.00277 0.046111 7 173 192 17779 ubiquitin-protein ligase activity PPIL2, RAG1, TRAF6, TNFAIP3, UBE2L3, UBE2Q1, BRAP
2252 0.00278 0.046111 6 129 192 17779 immune effector process PTPRC, ICAM1, LAT, C5, RAG2, IL6R
151 0.00289 0.047602 6 130 192 17779 ubiquitin ligase complex PAM, BCRP2, PPIL2, UBE2L3, BRAP, OS9
2757 0.00293 0.047821 4 55 192 17779 immune response-activating signal PTPRC, CD19, CD247, TRAF6
transduction
4726 0.00311 0.048837 2 8 192 17779 non-membrane spanning protein tyrosine PTPN2, PTPN11
phosphatase activity
45830 0.00311 0.048837 2 8 192 17779 positive regulation of isotype switching CD40, IL2
19883 0.00311 0.048837 2 8 192 17779 antigen processing and presentation of TAP2, TAP1
endogenous antigen
2704 0.00311 0.048837 2 8 192 17779 negative regulation of leukocyte mediated PTPRC, TAP1
immunity
2707 0.00311 0.048837 2 8 192 17779 negative regulation of lymphocyte mediated PTPRC, TAP1
immunity
45075 0.00311 0.048837 2 8 192 17779 regulation of interleukin-12 biosynthetic REL, TRAF6
process
32102 0.00312 0.048837 4 56 192 17779 negative regulation of response to external CLN3, IL2RA, C5, IL2
stimulus
42802 0.00317 0.04919 16 685 192 17779 identical protein binding PAM, PFKFB3, ERBB2, CD247, RAG1, IL6R, BRAP, RPP14, FGFR1OP, TAP1, ATP2A1, CD2, KCTD17,
ALDH2, PIP4K2C, CD28

Supplementary Table 6 Significant BiNGO results for the expanded gene list after exclusion of MHC region genes. Each row shows the significant GO ID, p-value, p-value after correction
for multiple testing, the number of genes observed and in total for the gene list and all human genes, GO term description and genes identified.
GO-ID p-value corr p-value x n X N Description Genes in test set
2682 6.20E-12 7.22E-09 24 425 177 17782 regulation of immune system process PTPRC, ICAM1, IL2RA, ERBB2, IL27, CD247, C5, CTLA4, PTPN22, CDK6, CD40, IL6R, IL21, PRKCQ,
141

LAT, CD19, ICOS, CD2, CHRNB2, UBASH3A, TRAFD1, TRAF6, CD28, IL2
50670 9.03E-12 7.22E-09 13 94 177 17782 regulation of lymphocyte proliferation PTPRC, IL2RA, ERBB2, IL27, CTLA4, CD40, IL21, PRKCQ, ICOS, CHRNB2, TRAF6, IL2, CD28
32944 1.04E-11 7.22E-09 13 95 177 17782 regulation of mononuclear cell PTPRC, IL2RA, ERBB2, IL27, CTLA4, CD40, IL21, PRKCQ, ICOS, CHRNB2, TRAF6, IL2, CD28
proliferation
70663 1.19E-11 7.22E-09 13 96 177 17782 regulation of leukocyte proliferation PTPRC, IL2RA, ERBB2, IL27, CTLA4, CD40, IL21, PRKCQ, ICOS, CHRNB2, TRAF6, IL2, CD28
51249 5.48E-11 2.23E-08 15 158 177 17782 regulation of lymphocyte activation PTPRC, IL2RA, ERBB2, IL27, CTLA4, CD40, IL21, LAT, PRKCQ, ICOS, CD2, CHRNB2, TRAF6, IL2,
CD28
42129 5.51E-11 2.23E-08 11 67 177 17782 regulation of T cell proliferation PTPRC, PRKCQ, IL2RA, ICOS, IL27, ERBB2, CTLA4, TRAF6, IL21, CD28, IL2
50863 2.59E-10 8.97E-08 13 122 177 17782 regulation of T cell activation PTPRC, IL2RA, ERBB2, IL27, CTLA4, IL21, LAT, PRKCQ, ICOS, CD2, TRAF6, IL2, CD28
2694 3.50E-10 1.06E-07 15 180 177 17782 regulation of leukocyte activation PTPRC, IL2RA, ERBB2, IL27, CTLA4, CD40, IL21, LAT, PRKCQ, ICOS, CD2, CHRNB2, TRAF6, IL2,
CD28
50865 8.63E-10 2.33E-07 15 192 177 17782 regulation of cell activation PTPRC, IL2RA, ERBB2, IL27, CTLA4, CD40, IL21, LAT, PRKCQ, ICOS, CD2, CHRNB2, TRAF6, IL2,
CD28
50671 1.35E-08 3.27E-06 9 64 177 17782 positive regulation of lymphocyte PTPRC, PRKCQ, IL2RA, CHRNB2, CD40, TRAF6, IL21, CD28, IL2
proliferation
32946 1.55E-08 3.42E-06 9 65 177 17782 positive regulation of mononuclear cell PTPRC, PRKCQ, IL2RA, CHRNB2, CD40, TRAF6, IL21, CD28, IL2
proliferation
70665 1.78E-08 3.60E-06 9 66 177 17782 positive regulation of leukocyte PTPRC, PRKCQ, IL2RA, CHRNB2, CD40, TRAF6, IL21, CD28, IL2
proliferation
2683 2.64E-08 4.93E-06 10 92 177 17782 negative regulation of immune system PTPRC, IL2RA, ICOS, ERBB2, C5, CTLA4, PTPN22, UBASH3A, TRAFD1, IL2
process
2684 6.77E-08 1.17E-05 15 265 177 17782 positive regulation of immune system PTPRC, ICAM1, IL2RA, C5, CD247, IL6R, CD40, IL21, PRKCQ, CD19, CD2, CHRNB2, TRAF6, IL2,
process CD28
45619 8.97E-08 1.45E-05 8 57 177 17782 regulation of lymphocyte differentiation IL2RA, ICOS, IL27, ERBB2, CD2, CTLA4, IL21, IL2
50776 1.06E-07 1.60E-05 14 236 177 17782 regulation of immune response PTPRC, ICAM1, IL27, C5, CD247, CTLA4, CD40, IL21, CD19, ICOS, TRAFD1, TRAF6, IL2, CD28
42102 1.73E-07 2.47E-05 7 42 177 17782 positive regulation of T cell proliferation PTPRC, PRKCQ, IL2RA, TRAF6, IL21, CD28, IL2
2696 2.43E-07 3.28E-05 10 116 177 17782 positive regulation of leukocyte activation PTPRC, PRKCQ, IL2RA, CD2, CHRNB2, CD40, TRAF6, IL21, CD28, IL2
50867 3.61E-07 4.61E-05 10 121 177 17782 positive regulation of cell activation PTPRC, PRKCQ, IL2RA, CD2, CHRNB2, CD40, TRAF6, IL21, CD28, IL2
45580 5.19E-07 6.30E-05 7 49 177 17782 regulation of T cell differentiation IL2RA, ICOS, IL27, ERBB2, CD2, CTLA4, IL2
8284 7.91E-07 9.15E-05 18 459 177 17782 positive regulation of cell proliferation PTPRC, IL2RA, ERBB2, IL27, CDK6, IL6R, CD40, IL21, GLI1, S1PR2, PRKCQ, HIPK1, FGFR1OP,
CHRNB2, TRAF6, RBPJ, IL2, CD28
51251 1.04E-06 1.14E-04 9 105 177 17782 positive regulation of lymphocyte PTPRC, PRKCQ, IL2RA, CHRNB2, CD40, TRAF6, IL21, CD28, IL2
activation
9897 1.22E-06 1.29E-04 10 138 177 17782 external side of plasma membrane ICAM1, IL2RB, IL2RA, CD19, ICOS, CD2, CTLA4, CHRNB2, CD40, CD28
48583 1.28E-06 1.29E-04 19 525 177 17782 regulation of response to stimulus PTPRC, ICAM1, CLN3, IL2RA, IL27, CD247, C5, CTLA4, IL6R, CD40, IL21, TMPRSS6, CD19, ICOS,
CHRNB2, TRAFD1, TRAF6, CD28, IL2
45589 3.85E-06 3.74E-04 3 4 177 17782 regulation of regulatory T cell ICOS, CTLA4, IL2
differentiation
2697 1.01E-05 9.42E-04 8 105 177 17782 regulation of immune effector process PTPRC, ICAM1, IL27, CD40, TRAF6, IL21, CD28, IL2
42108 1.11E-05 9.97E-04 6 51 177 17782 positive regulation of cytokine PRKCQ, REL, IL27, TRAF6, IL21, CD28
biosynthetic process
142

50870 1.15E-05 9.97E-04 7 77 177 17782 positive regulation of T cell activation PTPRC, PRKCQ, IL2RA, TRAF6, IL21, CD28, IL2
70664 1.50E-05 1.17E-03 5 32 177 17782 negative regulation of leukocyte IL2RA, ICOS, ERBB2, CTLA4, IL2
proliferation
50672 1.50E-05 1.17E-03 5 32 177 17782 negative regulation of lymphocyte IL2RA, ICOS, ERBB2, CTLA4, IL2
proliferation
32945 1.50E-05 1.17E-03 5 32 177 17782 negative regulation of mononuclear cell IL2RA, ICOS, ERBB2, CTLA4, IL2
proliferation
51239 2.09E-05 1.59E-03 26 1068 177 17782 regulation of multicellular organismal ERBB2, C5, KEAP1, PXK, GLI1, REL, ICOS, S1PR5, CD2, UBASH3A, TRAF6, CD28, IL2RA, IL27,
process CTLA4, CDK6, IL6R, CD40, IL21, TMPRSS6, PTPN11, ATXN2, PRKCQ, ATP2A1, CHRNB2, IL2
45321 2.24E-05 1.64E-03 11 233 177 17782 leukocyte activation PTPRC, ICAM1, LAT, CD2, RAG1, CHRNB2, CD40, RAG2, TRAF6, CD28, IL2
50789 2.67E-05 1.90E-03 92 6553 177 17782 regulation of biological process FAM3D, PNMT, PTPN22, PDHB, GLI1, NFATC2IP, BATF, S1PR2, GSN, PPP1R1B, PDE4A, CCDC101,
S1PR5, SPRED2, PHTF1, AGAP2, ARHGAP9, ICAM1, CLN3, CEP110, IL27, CDK6, IL6R, CD40, IL21,
FLNB, DDIT3, BRAP, TYK2, PRKCQ, CCR6, HIPK1, FGFR1OP, NCOA5, ZGLP1, RAB14, CUX2,
TNFAIP3, ZNF438, TRAF1, C17ORF37, PAM, GSDMA, BLK, ERBB2, MAPKAPK5, CD247, C5,
PI4KAP2, RAG1, KEAP1, PXK, RAG2, CDC37, HIC2, TAGAP, STAT4, FAM107A, REL, CCL21, RPL6,
ICOS, POU2F1, CD2, UBASH3A, TRAF6, CD28, PTPRC, IL2RB, IKZF3, IL2RA, CTLA4, AFF3, UBE2L3,
TMPRSS6, UBE2Q1, PTPN11, ATXN2, LAT, CD19, PHF19, IRF5, NUPR1, ATP2A1, NEUROD2,
CHRNB2, TRAFD1, RBPJ, GRB7, KIAA1045, ADAR, IL2
1817 3.55E-05 2.46E-03 10 202 177 17782 regulation of cytokine production PRKCQ, REL, IL27, C5, UBASH3A, CD40, IL6R, TRAF6, IL21, CD28
23052 3.69E-05 2.49E-03 53 3130 177 17782 signaling PTPN22, GLI1, S1PR2, PPP1R1B, GSN, PDE4A, SPRED2, AGAP2, ARHGAP9, CLN3, MAGI3, KIF5A,
IL6R, CD40, IL21, FLNB, DDIT3, BRAP, TYK2, PRKCQ, CCR6, HIPK1, RAB14, TRAF1, MAPKAPK5,
ERBB2, BLK, C5, CD247, PI4KAP2, STAT4, TAGAP, CCL21, CD2, POU2F1, SH2B3, TRAF6, CD28,
PTPRC, IL2RB, IL2RA, PTPN2, DTX3, SLC12A5, TMPRSS6, PTPN11, LAT, CD19, CHRNB2, RBPJ,
GRB7, GPR31, IL2
48585 4.03E-05 2.64E-03 8 127 177 17782 negative regulation of response to PTPRC, CLN3, IL2RA, ICOS, C5, CTLA4, TRAFD1, IL2
stimulus
65007 5.20E-05 3.32E-03 95 6942 177 17782 biological regulation FAM3D, PNMT, PTPN22, PDHB, OS9, GLI1, NFATC2IP, BATF, S1PR2, GSN, PPP1R1B, PDE4A,
CCDC101, S1PR5, SPRED2, PHTF1, AGAP2, ARHGAP9, ICAM1, CLN3, CEP110, IL27, CDK6, CD40,
IL6R, IL21, FLNB, BRAP, DDIT3, TYK2, PRKCQ, CCR6, HIPK1, FGFR1OP, NCOA5, ZGLP1, RAB14,
CUX2, TNFAIP3, ZNF438, TRAF1, C17ORF37, PAM, GSDMA, BLK, ERBB2, MAPKAPK5, CD247, C5,
PI4KAP2, RAG1, KEAP1, PXK, RAG2, CDC37, HIC2, TAGAP, STAT4, FAM107A, REL, CCL21, RPL6,
ICOS, POU2F1, CD2, UBASH3A, TRAF6, CD28, PTPRC, IL2RB, IKZF3, IL2RA, SLC12A5, CTLA4, AFF3,
UBE2L3, TMPRSS6, UBE2Q1, TMPRSS3, PTPN11, ATXN2, LAT, CD19, PHF19, IRF5, NUPR1,
ATP2A1, NEUROD2, CHRNB2, TRAFD1, RBPJ, GRB7, KIAA1045, ADAR, IL2
2376 6.92E-05 4.31E-03 23 948 177 17782 immune system process PTPRC, ICAM1, IL2RA, IL27, CD247, C5, CTLA4, RAG1, CD40, IL6R, RAG2, IL21, LAT, CD19, CCR6,
CCL21, ICOS, CD2, SH2B3, CHRNB2, TRAF6, CD28, IL2
51023 7.79E-05 4.73E-03 3 9 177 17782 regulation of immunoglobulin secretion CD40, TRAF6, IL2
2768 8.17E-05 4.84E-03 5 45 177 17782 immune response-regulating cell surface PTPRC, CD19, CD247, CD40, TRAF6
receptor signaling pathway
5515 9.22E-05 5.05E-03 106 8118 177 17782 protein binding XPO1, PTPN22, SHE, BATF, EIF3CL, PDE4A, RAVER1, SPRED2, LONRF2, MAGI3, KIF5A, IL27,
NAA25, CD40, IL21, DDIT3, BRAP, DCTN2, CCR6, FGFR1OP, RAB14, TNFAIP3, TRAF1, PAM,
143

PFKFB3, MAPKAPK5, ERBB2, BLK, KEAP1, PXK, CDC37, SBK1, ICOS, TRAF6, IKZF3, PTPN2, UBE2L3,
PTPN11, KCTD6, LAT, RABEP2, ATP2A1, KCTD17, ALDH2, CHRNB2, TRAFD1, SPNS1, AHSA2, RBPJ,
GRB7, PIP4K2C, FAM3D, GLI1, OS9, GSN, AGAP2, ARHGAP9, CLN3, ICAM1, ICAM4, ICAM5,
CEP110, ICAM3, CDK6, IL6R, FLNB, TYK2, PRKCQ, HIPK1, INHBE, ORMDL3, TUFM, TCAP, CD247,
C5, RAG1, RAG2, RPP14, HIC2, STAT4, FAM107A, REL, CCL21, POU2F1, CD2, SH2B3, PEX13,
SH2B1, TNPO3, MARS, CD28, PTPRC, IL2RB, IL2RA, DTX3, SLC12A5, CTLA4, UBE2Q1, RSBN1,
ATXN2, PHF19, CD19, AP4B1, KIAA1045, IL2, RBM17
4911 9.85E-05 5.05E-03 2 2 177 17782 interleukin-2 receptor activity IL2RB, IL2RA
9439 9.85E-05 5.05E-03 2 2 177 17782 cyanate metabolic process TST, MPST
9440 9.85E-05 5.05E-03 2 2 177 17782 cyanate catabolic process TST, MPST
45590 9.85E-05 5.05E-03 2 2 177 17782 negative regulation of regulatory T cell ICOS, CTLA4
differentiation
46013 9.85E-05 5.05E-03 2 2 177 17782 regulation of T cell homeostatic IL2RA, IL2
proliferation
1775 1.00E-04 5.05E-03 11 275 177 17782 cell activation PTPRC, ICAM1, LAT, CD2, RAG1, CHRNB2, CD40, RAG2, TRAF6, CD28, IL2
30890 1.02E-04 5.05E-03 4 25 177 17782 positive regulation of B cell proliferation PTPRC, CHRNB2, CD40, IL2
46649 1.05E-04 5.07E-03 9 186 177 17782 lymphocyte activation PTPRC, ICAM1, CD2, RAG1, CHRNB2, CD40, RAG2, CD28, IL2
42127 1.10E-04 5.25E-03 21 848 177 17782 regulation of cell proliferation PTPRC, IL2RA, PNMT, ERBB2, IL27, CTLA4, CDK6, IL6R, CD40, IL21, GLI1, S1PR2, PRKCQ, HIPK1,
FGFR1OP, ICOS, CHRNB2, RBPJ, TRAF6, CD28, IL2
42035 1.18E-04 5.51E-03 6 77 177 17782 regulation of cytokine biosynthetic PRKCQ, REL, IL27, TRAF6, IL21, CD28
process
2706 1.36E-04 6.12E-03 5 50 177 17782 regulation of lymphocyte mediated PTPRC, CD40, TRAF6, IL21, IL2
immunity
2822 1.36E-04 6.12E-03 5 50 177 17782 regulation of adaptive immune response PTPRC, IL27, CD40, TRAF6, IL2
based on somatic recombination of
immune receptors built from
immunoglobulin superfamily domains
42130 1.39E-04 6.14E-03 4 27 177 17782 negative regulation of T cell proliferation IL2RA, ICOS, ERBB2, CTLA4
2819 1.50E-04 6.42E-03 5 51 177 17782 regulation of adaptive immune response PTPRC, IL27, CD40, TRAF6, IL2
1772 1.51E-04 6.42E-03 3 11 177 17782 immunological synapse PRKCQ, ICAM1, LAT
6955 1.54E-04 6.46E-03 17 619 177 17782 immune response PTPRC, ICAM1, IL2RA, IL27, C5, CTLA4, RAG1, IL6R, RAG2, IL21, LAT, CCR6, CCL21, ICOS, TRAF6,
CD28, IL2
9986 1.63E-04 6.65E-03 12 341 177 17782 cell surface PTPRC, ICAM1, IL2RB, IL2RA, CD19, ICOS, CD2, CTLA4, CHRNB2, CD40, IL6R, CD28
51250 1.64E-04 6.65E-03 5 52 177 17782 negative regulation of lymphocyte IL2RA, ICOS, ERBB2, CTLA4, IL2
activation
23033 1.73E-04 6.89E-03 38 2100 177 17782 signaling pathway BLK, ERBB2, CD247, C5, PI4KAP2, GLI1, S1PR2, STAT4, GSN, SPRED2, CD2, SH2B3, TRAF6, AGAP2,
CD28, PTPRC, CLN3, IL2RB, MAGI3, IL2RA, PTPN2, DTX3, IL6R, CD40, TMPRSS6, DDIT3, BRAP,
PTPN11, TYK2, PRKCQ, LAT, CD19, HIPK1, RAB14, RBPJ, GRB7, GPR31, IL2
50777 1.86E-04 7.26E-03 4 29 177 17782 negative regulation of immune response PTPRC, ICOS, CTLA4, TRAFD1
42110 1.89E-04 7.27E-03 7 119 177 17782 T cell activation PTPRC, ICAM1, CD2, RAG1, RAG2, CD28, IL2
42100 2.00E-04 7.57E-03 3 12 177 17782 B cell proliferation PTPRC, CD40, RAG2
144

2695 2.54E-04 9.50E-03 5 57 177 17782 negative regulation of leukocyte activation IL2RA, ICOS, ERBB2, CTLA4, IL2
30888 2.74E-04 1.00E-02 4 32 177 17782 regulation of B cell proliferation PTPRC, CHRNB2, CD40, IL2
47894 2.94E-04 1.00E-02 2 3 177 17782 flavonol 3-sulfotransferase activity SULT1A1, SULT1A2
4792 2.94E-04 1.00E-02 2 3 177 17782 thiosulfate sulfurtransferase activity TST, MPST
5134 2.94E-04 1.00E-02 2 3 177 17782 interleukin-2 receptor binding IL21, IL2
2331 2.94E-04 1.00E-02 2 3 177 17782 pre-B cell allelic exclusion RAG1, RAG2
19976 2.94E-04 1.00E-02 2 3 177 17782 interleukin-2 binding IL2RB, IL2RA
2703 2.99E-04 1.01E-02 5 59 177 17782 regulation of leukocyte mediated PTPRC, CD40, TRAF6, IL21, IL2
immunity
2764 3.24E-04 1.07E-02 5 60 177 17782 immune response-regulating signaling PTPRC, CD19, CD247, CD40, TRAF6
pathway
50854 3.26E-04 1.07E-02 3 14 177 17782 regulation of antigen receptor-mediated PTPRC, PTPN22, UBASH3A
signaling pathway
32880 3.32E-04 1.07E-02 8 172 177 17782 regulation of protein localization PRKCQ, PAM, C5, RAB14, CD40, TRAF6, PTPN11, IL2
30217 3.78E-04 1.21E-02 5 62 177 17782 T cell differentiation PTPRC, RAG1, RAG2, CD28, IL2
50851 3.90E-04 1.23E-02 4 35 177 17782 antigen receptor-mediated signaling PTPRC, CD19, CD247, TRAF6
pathway
50794 4.02E-04 1.25E-02 84 6223 177 17782 regulation of cellular process FAM3D, PNMT, PTPN22, PDHB, GLI1, NFATC2IP, BATF, S1PR2, GSN, PPP1R1B, PDE4A, CCDC101,
S1PR5, SPRED2, PHTF1, AGAP2, ARHGAP9, ICAM1, CLN3, CEP110, IL27, CDK6, IL6R, CD40, IL21,
FLNB, DDIT3, BRAP, TYK2, PRKCQ, CCR6, HIPK1, FGFR1OP, NCOA5, ZGLP1, RAB14, CUX2,
TNFAIP3, ZNF438, TRAF1, C17ORF37, PAM, GSDMA, BLK, ERBB2, MAPKAPK5, C5, PI4KAP2,
RAG1, KEAP1, PXK, CDC37, HIC2, TAGAP, STAT4, FAM107A, REL, CCL21, RPL6, ICOS, POU2F1,
CD2, UBASH3A, TRAF6, CD28, PTPRC, IL2RB, IKZF3, IL2RA, CTLA4, AFF3, UBE2L3, PTPN11, ATXN2,
LAT, PHF19, IRF5, NUPR1, NEUROD2, CHRNB2, RBPJ, GRB7, KIAA1045, IL2
50866 4.07E-04 1.25E-02 5 63 177 17782 negative regulation of cell activation IL2RA, ICOS, ERBB2, CTLA4, IL2
48471 4.24E-04 1.29E-02 11 325 177 17782 perinuclear region of cytoplasm ATXN2, GSDMA, GSN, KIF5A, FGFR1OP, ICOS, PDE4A, ERBB2, ATP2A1, RAB14, CTLA4
50708 4.71E-04 1.41E-02 5 65 177 17782 regulation of protein secretion PRKCQ, C5, CD40, TRAF6, IL2
50871 4.85E-04 1.44E-02 4 37 177 17782 positive regulation of B cell activation PTPRC, CHRNB2, CD40, IL2
8150 5.00E-04 1.46E-02 159 14298 177 17782 biological_process XPO1, PTPN22, GGT2, SLC26A10, DNASE1L3, PDHB, STARD3, BATF, S1PR2, EIF3CL, OLFML3,
PPP1R1B, PDE4A, SULT1A1, S1PR5, SPRED2, PHTF1, SULT1A2, LONRF2, MAGI3, BCL2L15, KIF5A,
IL27, ERP29, CD40, IL21, DDIT3, BRAP, DCTN2, DCLRE1B, CCR6, FGFR1OP, ZGLP1, RAB14, FDX1L,
TNFAIP3, ZNF438, MPST, TRAF1, C17ORF37, PAM, GSDMA, PFKFB3, BLK, ERBB2, MAPKAPK5,
PPIL2, KEAP1, PXK, CDC37, TAGAP, SBK1, RPL6, ICOS, UBASH3A, TRAF6, IKZF3, PTPN2, UBE2L3,
KCTD6, PTPN11, TST, TMEM116, LAT, RABEP2, ATP2A1, ALDH2, KCTD17, CHRNB2, TRAFD1,
RBPJ, AHSA2, SPNS1, GRB7, ACAD10, PIP4K2C, GPR31, ADAR, ACOX2, FAM3D, PNMT, PPIP5K2,
OS9, GLI1, NFATC2IP, GSN, CCDC101, AGAP2, ARHGAP9, PGAP3, ICAM1, CLN3, MRPL4, ICAM4,
ICAM5, CEP110, ICAM3, CDK6, IL6R, FLNB, TYK2, PRKCQ, HIPK1, ZPBP2, INHBE, NCOA5, CUX2,
ORMDL3, TUFM, TCAP, CD247, ADAD1, PUS10, C5, PI4KAP2, RAG1, C12ORF51, RAG2, RPP14,
PLCL2, GIN1, HIC2, STAT4, FAM107A, ATXN2L, REL, CCL21, RNASET2, BCRP2, POU2F1, CD2,
SH2B3, PEX13, SH2B1, USP34, TNPO3, CD28, MARS, B4GALNT1, PTPRC, IL2RB, IL2RA, DTX3,
SLC12A5, CTLA4, AFF3, TMPRSS6, UBE2Q1, TMPRSS3, ATXN2, CD19, PHF19, IRF5, NUPR1,
145

NEUROD2, AP4B1, KIAA1045, RBM17, IL2


9987 5.83E-04 1.65E-02 115 9365 177 17782 cellular process XPO1, PTPN22, GGT2, SLC26A10, DNASE1L3, PDHB, S1PR2, STARD3, EIF3CL, PDE4A, SULT1A1,
S1PR5, SULT1A2, MAGI3, BCL2L15, KIF5A, ERP29, CD40, IL21, DDIT3, DCTN2, DCLRE1B, CCR6,
FGFR1OP, ZGLP1, RAB14, FDX1L, TNFAIP3, MPST, C17ORF37, PAM, GSDMA, PFKFB3, BLK,
MAPKAPK5, ERBB2, PPIL2, PXK, CDC37, SBK1, RPL6, TRAF6, PTPN2, UBE2L3, PTPN11, TST, LAT,
RABEP2, ATP2A1, ALDH2, CHRNB2, SPNS1, RBPJ, PIP4K2C, ADAR, ACOX2, PNMT, PPIP5K2, OS9,
GLI1, GSN, PGAP3, CLN3, ICAM1, MRPL4, ICAM4, ICAM5, CEP110, ICAM3, CDK6, IL6R, FLNB,
TYK2, PRKCQ, HIPK1, ZPBP2, NCOA5, ORMDL3, TUFM, TCAP, ADAD1, PUS10, C5, RAG1, PI4KAP2,
RAG2, C12ORF51, RPP14, GIN1, STAT4, CCL21, RNASET2, BCRP2, POU2F1, CD2, SH2B3, PEX13,
SH2B1, USP34, MARS, B4GALNT1, CD28, PTPRC, IL2RA, SLC12A5, TMPRSS6, UBE2Q1, TMPRSS3,
ATXN2, NUPR1, NEUROD2, AP4B1, KIAA1045, IL2, RBM17
2327 5.83E-04 1.65E-02 2 4 177 17782 immature B cell differentiation RAG1, RAG2
2329 5.83E-04 1.65E-02 2 4 177 17782 pre-B cell differentiation RAG1, RAG2
16788 6.27E-04 1.75E-02 17 699 177 17782 hydrolase activity, acting on ester bonds PGAP3, PTPRC, PTPN2, PFKFB3, ABHD6, PPIP5K2, RAG1, PTPN22, RAG2, DNASE1L3, PTPN11,
RPP14, PLCL2, RNASET2, PDE4A, USP34, TNFAIP3
2700 6.56E-04 1.81E-02 4 40 177 17782 regulation of production of molecular CD40, TRAF6, IL21, IL2
mediator of immune response
2429 7.21E-04 1.97E-02 4 41 177 17782 immune response-activating cell surface PTPRC, CD19, CD247, TRAF6
receptor signaling pathway
42113 8.54E-04 2.30E-02 5 74 177 17782 B cell activation PTPRC, RAG1, CHRNB2, CD40, RAG2
50868 8.65E-04 2.31E-02 4 43 177 17782 negative regulation of T cell activation IL2RA, ICOS, ERBB2, CTLA4
7166 9.11E-04 2.40E-02 25 1280 177 17782 cell surface receptor linked signaling ERBB2, C5, CD247, GLI1, S1PR2, STAT4, CD2, TRAF6, CD28, CLN3, PTPRC, IL2RB, IL2RA, PTPN2,
pathway DTX3, CD40, IL6R, PTPN11, LAT, CD19, HIPK1, RBPJ, GRB7, GPR31, IL2
16783 9.66E-04 2.47E-02 2 5 177 17782 sulfurtransferase activity TST, MPST
48304 9.66E-04 2.47E-02 2 5 177 17782 positive regulation of isotype switching to CD40, IL2
IgG isotypes
4062 0.00097 0.024679 2 5 177 17782 aryl sulfotransferase activity SULT1A1, SULT1A2
44237 0.00104 0.026209 69 4989 177 17782 cellular metabolic process ACOX2, PNMT, PPIP5K2, PTPN22, GGT2, DNASE1L3, PDHB, OS9, S1PR2, EIF3CL, PDE4A, SULT1A1,
SULT1A2, PGAP3, CLN3, MRPL4, ERP29, CDK6, IL21, DDIT3, TYK2, PRKCQ, DCLRE1B, HIPK1,
NCOA5, FDX1L, ORMDL3, TNFAIP3, MPST, TUFM, PAM, PFKFB3, BLK, MAPKAPK5, ERBB2,
ADAD1, PPIL2, PUS10, C5, PI4KAP2, RAG1, RAG2, C12ORF51, RPP14, GIN1, STAT4, SBK1, RPL6,
RNASET2, BCRP2, PEX13, TRAF6, USP34, MARS, B4GALNT1, PTPRC, PTPN2, UBE2L3, UBE2Q1,
PTPN11, TST, ATXN2, LAT, ATP2A1, RBPJ, KIAA1045, PIP4K2C, RBM17, ADAR
32655 0.00113 0.028056 3 21 177 17782 regulation of interleukin-12 production REL, CD40, TRAF6
10646 0.00113 0.028056 23 1154 177 17782 regulation of cell communication TRAF1, PTPRC, CLN3, FAM3D, ERBB2, C5, PTPN22, PXK, CD40, IL6R, IL21, CDC37, BRAP, PTPN11,
S1PR2, REL, SPRED2, CHRNB2, UBASH3A, TNFAIP3, TRAF6, AGAP2, IL2
44267 0.00117 0.028589 36 2153 177 17782 cellular protein metabolic process TUFM, PAM, BLK, MAPKAPK5, ERBB2, C5, PPIL2, RAG1, PTPN22, C12ORF51, OS9, S1PR2, EIF3CL,
STAT4, SBK1, RPL6, BCRP2, USP34, TRAF6, MARS, PGAP3, PTPRC, CLN3, MRPL4, PTPN2, ERP29,
CDK6, UBE2L3, UBE2Q1, PTPN11, TYK2, PRKCQ, HIPK1, NCOA5, TNFAIP3, KIAA1045
30004 0.0013 0.031529 3 22 177 17782 cellular monovalent inorganic cation CLN3, SLC12A5, TMPRSS3
homeostasis
146

45595 0.00133 0.031864 14 554 177 17782 regulation of cell differentiation IL2RA, ERBB2, IL27, CTLA4, CDK6, KEAP1, IL6R, IL21, ICOS, S1PR5, CD2, CHRNB2, TRAF6, IL2
48519 0.00148 0.035116 34 2021 177 17782 negative regulation of biological process FAM3D, PNMT, ERBB2, C5, RAG1, PTPN22, PXK, HIC2, GSN, ICOS, POU2F1, UBASH3A, TRAF6,
PTPRC, CLN3, IL2RB, IL2RA, CTLA4, CDK6, IL6R, TMPRSS6, BRAP, DDIT3, PTPN11, ATXN2, ZGLP1,
ATP2A1, CUX2, TRAFD1, RBPJ, TNFAIP3, KIAA1045, ADAR, IL2
50864 0.00165 0.038525 4 51 177 17782 regulation of B cell activation PTPRC, CHRNB2, CD40, IL2
1910 0.00168 0.038525 3 24 177 17782 regulation of leukocyte mediated PTPRC, ICAM1, IL21
cytotoxicity
2637 0.00168 0.038525 3 24 177 17782 regulation of immunoglobulin production CD40, TRAF6, IL2
50852 0.00168 0.038525 3 24 177 17782 T cell receptor signaling pathway PTPRC, CD247, TRAF6
2521 0.00171 0.038793 6 127 177 17782 leukocyte differentiation PTPRC, RAG1, RAG2, TRAF6, CD28, IL2
4842 0.00175 0.039398 7 173 177 17782 ubiquitin-protein ligase activity PPIL2, RAG1, TRAF6, TNFAIP3, UBE2L3, UBE2Q1, BRAP
2252 0.00185 0.041202 6 129 177 17782 immune effector process PTPRC, ICAM1, LAT, C5, RAG2, IL6R
9966 0.00187 0.041202 17 773 177 17782 regulation of signal transduction TRAF1, PTPRC, ERBB2, C5, IL6R, CD40, IL21, BRAP, CDC37, PTPN11, S1PR2, REL, SPRED2,
TNFAIP3, TRAF6, AGAP2, IL2
151 0.00193 0.041257 6 130 177 17782 ubiquitin ligase complex PAM, BCRP2, PPIL2, UBE2L3, BRAP, OS9
44238 0.00195 0.041257 71 5286 177 17782 primary metabolic process ACOX2, PNMT, PPIP5K2, PTPN22, GGT2, DNASE1L3, PDHB, OS9, STARD3, S1PR2, EIF3CL, PDE4A,
SULT1A1, SULT1A2, LONRF2, PGAP3, CLN3, MRPL4, ERP29, CDK6, DDIT3, TYK2, PRKCQ,
DCLRE1B, HIPK1, NCOA5, ORMDL3, TNFAIP3, TUFM, PAM, PFKFB3, BLK, MAPKAPK5, ERBB2,
ADAD1, PPIL2, PUS10, C5, PI4KAP2, RAG1, RAG2, C12ORF51, RPP14, PLCL2, GIN1, STAT4, SBK1,
RPL6, RNASET2, BCRP2, PEX13, TRAF6, USP34, MARS, B4GALNT1, PTPRC, PTPN2, UBE2L3,
TMPRSS6, UBE2Q1, TMPRSS3, PTPN11, ATXN2, LAT, ATP2A1, ALDH2, RBPJ, KIAA1045, PIP4K2C,
RBM17, ADAR
23051 0.002 0.041257 17 778 177 17782 regulation of signaling process TRAF1, PTPRC, ERBB2, C5, IL6R, CD40, IL21, BRAP, CDC37, PTPN11, S1PR2, REL, SPRED2,
TNFAIP3, TRAF6, AGAP2, IL2
48302 0.002 0.041257 2 7 177 17782 regulation of isotype switching to IgG CD40, IL2
isotypes
45084 0.002 0.041257 2 7 177 17782 positive regulation of interleukin-12 REL, TRAF6
biosynthetic process
50858 0.002 0.041257 2 7 177 17782 negative regulation of antigen receptor- PTPN22, UBASH3A
mediated signaling pathway
50860 0.002 0.041257 2 7 177 17782 negative regulation of T cell receptor PTPN22, UBASH3A
signaling pathway
16881 0.00201 0.041257 8 227 177 17782 acid-amino acid ligase activity PPIL2, RAG1, C12ORF51, TRAF6, TNFAIP3, UBE2L3, UBE2Q1, BRAP
2757 0.00218 0.044443 4 55 177 17782 immune response-activating signal PTPRC, CD19, CD247, TRAF6
transduction
32102 0.00233 0.047099 4 56 177 17782 negative regulation of response to CLN3, IL2RA, C5, IL2
external stimulus
50793 0.00241 0.048248 17 792 177 17782 regulation of developmental process IL2RA, ERBB2, IL27, C5, CTLA4, KEAP1, CDK6, IL6R, CD40, IL21, GLI1, ICOS, S1PR5, CD2, CHRNB2,
TRAF6, IL2
32879 0.00252 0.049896 16 727 177 17782 regulation of localization PTPRC, PAM, ICAM1, FAM3D, C5, IL6R, CD40, PXK, PTPN11, PRKCQ, FGFR1OP, RAB14, CHRNB2,
TRAF6, RBPJ, IL2
147

5126 0.00256 0.049896 7 185 177 17782 cytokine receptor binding TYK2, CCL21, IL27, C5, SPRED2, IL21, IL2
30098 0.0026 0.049896 5 95 177 17782 lymphocyte differentiation PTPRC, RAG1, RAG2, CD28, IL2
31341 0.00265 0.049896 3 28 177 17782 regulation of cell killing PTPRC, ICAM1, IL21
16782 0.00265 0.049896 4 58 177 17782 transferase activity, transferring sulfur- TST, SULT1A1, SULT1A2, MPST
containing groups
4726 0.00265 0.049896 2 8 177 17782 non-membrane spanning protein tyrosine PTPN2, PTPN11
phosphatase activity
45830 0.00265 0.049896 2 8 177 17782 positive regulation of isotype switching CD40, IL2
45075 0.00265 0.049896 2 8 177 17782 regulation of interleukin-12 biosynthetic REL, TRAF6
process
148

Supplementary Figure 1a BiNGO network produced using the confirmed gene list. GO terms are coloured based on significance.
149

Supplementary Figure 1b BiNGO network produced using the confirmed gene list. GO terms are coloured based on significance.
150

Supplementary Figure 1c BiNGO network produced using the confirmed gene list. GO terms are coloured based on significance.
151

Supplementary Figure 2a BiNGO network produced using the confirmed gene list after exclusion of MHC region genes. GO terms are coloured based on significance.
152

Supplementary Figure 2b BiNGO network produced using the confirmed gene list after exclusion of MHC region genes. GO terms are
coloured based on significance.
153

Supplementary Figure 2c BiNGO network produced using the confirmed gene list after exclusion of MHC region genes. GO terms are coloured based on significance.
154

Supplementary Figure 3a BiNGO network produced using the expanded gene list. GO terms are coloured based on significance.
155

Supplementary Figure 3b BiNGO network produced using the expanded gene list. GO terms are coloured based on significance.
156

Supplementary Figure 4a BiNGO network produced using the expanded gene list after exclusion of MHC region genes. GO terms are coloured based on significance.
157

Supplementary Figure 4b BiNGO network produced using the expanded gene list after
exclusion of MHC region genes. GO terms are coloured based on significance.
158

Supplementary Figure 4c BiNGO network produced using the expanded gene list after exclusion of MHC region genes. GO terms are coloured based on significance.
Supplementary Table 7 shows all 182 GO IDs by overlap category with their corresponding
description.

Supplementary Table 7 GO terms showing significance by category.


Category GO Term GO Term Description
Confirmed 7219 Notch signaling pathway
15197 peptide transporter activity
Expanded 1911 negative regulation of leukocyte mediated cytotoxicity
2704 negative regulation of leukocyte mediated immunity
2707 negative regulation of lymphocyte mediated immunity
5737 cytoplasm
5789 endoplasmic reticulum membrane
31342 negative regulation of cell killing
42802 identical protein binding
44432 endoplasmic reticulum part
44437 vacuolar part
45824 negative regulation of innate immune response
48002 antigen processing and presentation of peptide antigen
48518 positive regulation of biological process
71212 subsynaptic reticulum
80134 regulation of response to stress
Expanded no HLA 9966 regulation of signal transduction
9987 cellular process
10646 regulation of cell communication
16782 transferase activity, transferring sulfur-containing groups
16881 acid-amino acid ligase activity
23051 regulation of signaling process
30098 lymphocyte differentiation
32879 regulation of localization
44238 primary metabolic process
Confirmed & Confirmed no HLA 5813 centrosome
42104 positive regulation of activated T cell proliferation
45191 regulation of isotype switching
45334 clathrin-coated endocytic vesicle
45581 negative regulation of T cell differentiation
45621 positive regulation of lymphocyte differentiation
45911 positive regulation of DNA recombination
46006 regulation of activated T cell proliferation
Confirmed & Expanded 323 lytic vacuole
2483 antigen processing and presentation of endogenous peptide antigen
2504 antigen processing and presentation of peptide or polysaccharide
antigen via MHC class II
5764 lysosome
5765 lysosomal membrane
5768 endosome
5773 vacuole
5774 vacuolar membrane
10008 endosome membrane
19060 intracellular transport of viral proteins in host cell
19882 antigen processing and presentation
19883 antigen processing and presentation of endogenous antigen
19885 antigen processing and presentation of endogenous peptide antigen
via MHC class I
30581 symbiont intracellular protein transport in host
31347 regulation of defense response
31348 negative regulation of defense response
32395 MHC class II receptor activity
42175 nuclear membrane-endoplasmic reticulum network
42611 MHC protein complex
42613 MHC class II protein complex
42824 MHC class I peptide loading complex
42825 TAP complex
44440 endosomal part
44459 plasma membrane part
46719 regulation of viral protein levels in host cell
46967 cytosol to ER transport
46977 TAP binding
46978 TAP1 binding

159
51708 intracellular protein transport in other organism involved in symbiotic
interaction
Confirmed & Expanded no HLA 44237 cellular metabolic process
Confirmed no HLA & Expanded no HLA 5126 cytokine receptor binding
Expanded & Expanded no HLA 151 ubiquitin ligase complex
1772 immunological synapse
1775 cell activation
1817 regulation of cytokine production
1910 regulation of leukocyte mediated cytotoxicity
2252 immune effector process
2327 immature B cell differentiation
2329 pre-B cell differentiation
2331 pre-B cell allelic exclusion
2429 immune response-activating cell surface receptor signaling pathway
2521 leukocyte differentiation
2637 regulation of immunoglobulin production
2757 immune response-activating signal transduction
2764 immune response-regulating signaling pathway
2768 immune response-regulating cell surface receptor signaling pathway
2819 regulation of adaptive immune response
2822 regulation of adaptive immune response based on somatic
recombination of immune receptors built from immunoglobulin
superfamily domains
4062 aryl sulfotransferase activity
4726 non-membrane spanning protein tyrosine phosphatase activity
4792 thiosulfate sulfurtransferase activity
4842 ubiquitin-protein ligase activity
4911 interleukin-2 receptor activity
7166 cell surface receptor linked signaling pathway
8150 biological_process
9439 cyanate metabolic process
9440 cyanate catabolic process
16783 sulfurtransferase activity
16788 hydrolase activity, acting on ester bonds
19976 interleukin-2 binding
23033 signaling pathway
30004 cellular monovalent inorganic cation homeostasis
30217 T cell differentiation
30888 regulation of B cell proliferation
30890 positive regulation of B cell proliferation
31341 regulation of cell killing
32102 negative regulation of response to external stimulus
32655 regulation of interleukin-12 production
42035 regulation of cytokine biosynthetic process
42100 B cell proliferation
42110 T cell activation
42113 B cell activation
44267 cellular protein metabolic process
45075 regulation of interleukin-12 biosynthetic process
45084 positive regulation of interleukin-12 biosynthetic process
45321 leukocyte activation
45595 regulation of cell differentiation
46649 lymphocyte activation
47894 flavonol 3-sulfotransferase activity
48471 perinuclear region of cytoplasm
48519 negative regulation of biological process
50708 regulation of protein secretion
50789 regulation of biological process
50793 regulation of developmental process
50794 regulation of cellular process
50851 antigen receptor-mediated signaling pathway
50852 T cell receptor signaling pathway
50854 regulation of antigen receptor-mediated signaling pathway
50858 negative regulation of antigen receptor-mediated signaling pathway
50860 negative regulation of T cell receptor signaling pathway
50864 regulation of B cell activation
50871 positive regulation of B cell activation
51239 regulation of multicellular organismal process
65007 biological regulation
Confirmed, Expanded & Expanded no 2376 immune system process
HLA 2703 regulation of leukocyte mediated immunity

160
6955 immune response
9986 cell surface
48583 regulation of response to stimulus
50777 negative regulation of immune response
Confirmed no HLA, Expanded & 42127 regulation of cell proliferation
Expanded no HLA
All 2682 regulation of immune system process
2683 negative regulation of immune system process
2684 positive regulation of immune system process
2694 regulation of leukocyte activation
2695 negative regulation of leukocyte activation
2696 positive regulation of leukocyte activation
2697 regulation of immune effector process
2700 regulation of production of molecular mediator of immune response
2706 regulation of lymphocyte mediated immunity
5134 interleukin-2 receptor binding
5515 protein binding
8284 positive regulation of cell proliferation
9897 external side of plasma membrane
23052 signaling
32880 regulation of protein localization
32944 regulation of mononuclear cell proliferation
32945 negative regulation of mononuclear cell proliferation
32946 positive regulation of mononuclear cell proliferation
42102 positive regulation of T cell proliferation
42108 positive regulation of cytokine biosynthetic process
42129 regulation of T cell proliferation
42130 negative regulation of T cell proliferation
45580 regulation of T cell differentiation
45589 regulation of regulatory T cell differentiation
45590 negative regulation of regulatory T cell differentiation
45619 regulation of lymphocyte differentiation
45830 positive regulation of isotype switching
46013 regulation of T cell homeostatic proliferation
48302 regulation of isotype switching to IgG isotypes
48304 positive regulation of isotype switching to IgG isotypes
48585 negative regulation of response to stimulus
50670 regulation of lymphocyte proliferation
50671 positive regulation of lymphocyte proliferation
50672 negative regulation of lymphocyte proliferation
50776 regulation of immune response
50863 regulation of T cell activation
50865 regulation of cell activation
50866 negative regulation of cell activation
50867 positive regulation of cell activation
50868 negative regulation of T cell activation
50870 positive regulation of T cell activation
51023 regulation of immunoglobulin secretion
51249 regulation of lymphocyte activation
51250 negative regulation of lymphocyte activation
51251 positive regulation of lymphocyte activation
70663 regulation of leukocyte proliferation
70664 negative regulation of leukocyte proliferation
70665 positive regulation of leukocyte proliferation

161
7.1.1.2 DAVID
Full results from the DAVID analysis of the confirmed and expanded gene lists are shown in
Supplementary Table 8 and Supplementary Table 10 respectively. Similar results to the
confirmed gene list analysis were seen when MHC region genes were excluded; the main
difference resulting in the elimination of the MHC class II cluster (Supplementary Table 9).
This also resulted in a slight reduction of the number of genes not belonging to any of the
groups from 88 to 85 (72%). Similar clusters were also seen for the expanded gene list
when MHC region genes were excluded although some were more or less enriched than
before (Supplementary Table 11). In total, seven clusters were identified representing five
cluster groups identified from the expanded list, in addition to the hypothetical cluster
identified from the confirmed gene list and a new cluster containing mostly membrane
related genes.

Supplementary Table 8 DAVID functional classification results of the confirmed gene list including gene IDs.
Gene Group Enrichment Score ENSEMBL_GENE_ID Gene Name
1 3.621 ENSG00000196735 similar to hCG2042724; similar to HLA class II histocompatibility
antigen, DQ(1) alpha chain precursor (DC-4 alpha chain); major
histocompatibility complex, class II, DQ alpha 1
ENSG00000196126 major histocompatibility complex, class II, DR beta 4; major
histocompatibility complex, class II, DR beta 1
ENSG00000163600 inducible T-cell co-stimulator
ENSG00000120436 G protein-coupled receptor 31
ENSG00000204287 major histocompatibility complex, class II, DR alpha
ENSG00000143061 immunoglobulin superfamily, member 3
ENSG00000179344 major histocompatibility complex, class II, DQ beta 1; similar to
major histocompatibility complex, class II, DQ beta 1
ENSG00000204290 butyrophilin-like 2 (MHC class II associated)
ENSG00000158457 tetraspanin 33
ENSG00000198502 major histocompatibility complex, class II, DR beta 5
2 2.495 ENSG00000163600 inducible T-cell co-stimulator
ENSG00000178562 CD28 molecule
ENSG00000163599 cytotoxic T-lymphocyte-associated protein 4
ENSG00000116824 CD2 molecule
3 1.562 ENSG00000139269 inhibin, beta E
ENSG00000116774 olfactomedin-like 3
ENSG00000198643 family with sequence similarity 3, member D
ENSG00000026297 ribonuclease T2
4 0.447 ENSG00000138688 KIAA1109
ENSG00000205108 hypothetical protein LOC259308; chromosome 9 open reading
frame 144
ENSG00000204296 chromosome 6 open reading frame 10
ENSG00000158457 tetraspanin 33
ENSG00000163686 abhydrolase domain containing 6
88 genes not classified

Supplementary Table 9 DAVID functional classification results of the confirmed gene list after exclusion of
MHC region genes including Gene IDs.
Gene Group Enrichment Score ENSEMBL_GENE_ID Gene Name
1 1.750 ENSG00000163600 inducible T-cell co-stimulator
ENSG00000178562 CD28 molecule
ENSG00000163599 cytotoxic T-lymphocyte-associated protein 4
ENSG00000116824 CD2 molecule
2 0.955 ENSG00000116774 olfactomedin-like 3
ENSG00000139269 inhibin, beta E
ENSG00000198643 family with sequence similarity 3, member D
ENSG00000026297 ribonuclease T2
3 0.085 ENSG00000138688 KIAA1109
ENSG00000205108 hypothetical protein LOC259308; chromosome 9 open reading

162
frame 144
ENSG00000158457 tetraspanin 33
ENSG00000163686 abhydrolase domain containing 6
85 genes not classified

Supplementary Table 10 DAVID functional classification results of the expanded gene list including gene IDs.
Gene Group Enrichment Score ENSEMBL_GENE_ID Gene Name
1 0.948 ENSG00000139269 inhibin, beta E
ENSG00000116774 olfactomedin-like 3
ENSG00000198643 family with sequence similarity 3, member D
ENSG00000026297 ribonuclease T2
2 0.930 ENSG00000089022 mitogen-activated protein kinase-activated protein kinase 5
ENSG00000136573 B lymphoid tyrosine kinase
ENSG00000145725 histidine acid phosphatase domain containing 1
ENSG00000163349 homeodomain interacting protein kinase 1
ENSG00000105397 tyrosine kinase 2
ENSG00000188322 SH3-binding domain kinase 1
ENSG00000170525 6-phosphofructo-2-kinase/fructose-2,6-biphosphatase 3
ENSG00000166908 phosphatidylinositol-5-phosphate 4-kinase, type II, gamma
ENSG00000105810 cyclin-dependent kinase 6
3 0.928 ENSG00000206140 transmembrane protein 191B; transmembrane protein 191C
ENSG00000143061 immunoglobulin superfamily, member 3
ENSG00000177455 CD19 molecule
ENSG00000196735 similar to hCG2042724; similar to HLA class II histocompatibility
antigen, DQ(1) alpha chain precursor (DC-4 alpha chain); major
histocompatibility complex, class II, DQ alpha 1
ENSG00000172057 ORM1-like 3 (S. cerevisiae)
ENSG00000116824 CD2 molecule
ENSG00000198270 transmembrane protein 116
ENSG00000120436 G protein-coupled receptor 31
ENSG00000105371 intercellular adhesion molecule 4 (Landsteiner-Wiener blood
group)
ENSG00000105376 intercellular adhesion molecule 5, telencephalin
ENSG00000204296 chromosome 6 open reading frame 10
ENSG00000205108 hypothetical protein LOC259308; chromosome 9 open reading
frame 144
ENSG00000180739 sphingosine-1-phosphate receptor 5
ENSG00000163686 abhydrolase domain containing 6
ENSG00000204290 butyrophilin-like 2 (MHC class II associated)
ENSG00000175898 sphingosine-1-phosphate receptor 2
ENSG00000158457 tetraspanin 33
ENSG00000204287 major histocompatibility complex, class II, DR alpha
ENSG00000163599 cytotoxic T-lymphocyte-associated protein 4
ENSG00000138688 KIAA1109
ENSG00000196126 major histocompatibility complex, class II, DR beta 4; major
histocompatibility complex, class II, DR beta 1
ENSG00000076662 intercellular adhesion molecule 3
ENSG00000198502 major histocompatibility complex, class II, DR beta 5
ENSG00000100385 interleukin 2 receptor, beta
ENSG00000198156 nuclear pore complex interacting protein-like 1; olfactory
receptor, family 51, subfamily A, member 4
ENSG00000179344 major histocompatibility complex, class II, DQ beta 1; similar to
major histocompatibility complex, class II, DQ beta 1
ENSG00000163600 inducible T-cell co-stimulator
4 0.826 ENSG00000173064 chromosome 12 open reading frame 51
ENSG00000089234 BRCA1 associated protein
ENSG00000185651 ubiquitin-conjugating enzyme E2L 3
ENSG00000160714 ubiquitin-conjugating enzyme E2Q family member 1
5 0.335 ENSG00000156127 basic leucine zipper transcription factor, ATF-like
ENSG00000171532 neurogenic differentiation 2
ENSG00000144218 AF4/FMR2 family, member 3
ENSG00000128604 interferon regulatory factor 5
ENSG00000116793 putative homeodomain transcription factor 1
6 0.104 ENSG00000135148 TRAF-type zinc finger domain containing 1
ENSG00000143190 POU class 2 homeobox 1
ENSG00000168214 recombination signal binding protein for immunoglobulin kappa
J region
ENSG00000169635 hypermethylated in cancer 2
ENSG00000220201 GATA like protein-1
ENSG00000183621 zinc finger protein 438

163
ENSG00000111249 cut-like homeobox 2
ENSG00000122733 KIAA1045
ENSG00000178498 deltex homolog 3 (Drosophila)
ENSG00000119403 PHD finger protein 19
ENSG00000161405 IKAROS family zinc finger 3 (Aiolos)
192 genes not classified

Supplementary Table 11 DAVID functional classification results of the expanded gene list after exclusion of
MHC region genes including gene IDs.
Gene Group Enrichment Score ENSEMBL_GENE_ID Gene Name
1 0.861 ENSG00000089022 mitogen-activated protein kinase-activated protein kinase 5
ENSG00000136573 B lymphoid tyrosine kinase
ENSG00000145725 histidine acid phosphatase domain containing 1
ENSG00000163349 homeodomain interacting protein kinase 1
ENSG00000105397 tyrosine kinase 2
ENSG00000188322 SH3-binding domain kinase 1
ENSG00000170525 6-phosphofructo-2-kinase/fructose-2,6-biphosphatase 3
ENSG00000166908 phosphatidylinositol-5-phosphate 4-kinase, type II, gamma
ENSG00000105810 cyclin-dependent kinase 6
2 0.755 ENSG00000180739 sphingosine-1-phosphate receptor 5
ENSG00000105376 intercellular adhesion molecule 5, telencephalin
ENSG00000198156 nuclear pore complex interacting protein-like 1; olfactory
receptor, family 51, subfamily A, member 4
ENSG00000177455 CD19 molecule
ENSG00000076662 intercellular adhesion molecule 3
ENSG00000163600 inducible T-cell co-stimulator
ENSG00000116824 CD2 molecule
ENSG00000120436 G protein-coupled receptor 31
ENSG00000163599 cytotoxic T-lymphocyte-associated protein 4
ENSG00000143061 immunoglobulin superfamily, member 3
ENSG00000105371 intercellular adhesion molecule 4 (Landsteiner-Wiener blood
group)
ENSG00000158457 tetraspanin 33
ENSG00000100385 interleukin 2 receptor, beta
ENSG00000175898 sphingosine-1-phosphate receptor 2
3 0.683 ENSG00000173064 chromosome 12 open reading frame 51
ENSG00000089234 BRCA1 associated protein
ENSG00000185651 ubiquitin-conjugating enzyme E2L 3
ENSG00000160714 ubiquitin-conjugating enzyme E2Q family member 1
4 0.567 ENSG00000116774 olfactomedin-like 3
ENSG00000139269 inhibin, beta E
ENSG00000198643 family with sequence similarity 3, member D
ENSG00000026297 ribonuclease T2
5 0.408 ENSG00000171532 neurogenic differentiation 2
ENSG00000156127 basic leucine zipper transcription factor, ATF-like
ENSG00000144218 AF4/FMR2 family, member 3
ENSG00000128604 interferon regulatory factor 5
ENSG00000116793 putative homeodomain transcription factor 1
6 0.166 ENSG00000138688 KIAA1109
ENSG00000205108 hypothetical protein LOC259308; chromosome 9 open reading
frame 144
ENSG00000158457 tetraspanin 33
ENSG00000172057 ORM1-like 3 (S. cerevisiae)
ENSG00000198270 transmembrane protein 116
ENSG00000163686 abhydrolase domain containing 6
ENSG00000206140 transmembrane protein 191B; transmembrane protein 191C
7 0.124 ENSG00000135148 TRAF-type zinc finger domain containing 1
ENSG00000143190 POU class 2 homeobox 1
ENSG00000168214 recombination signal binding protein for immunoglobulin kappa
J region
ENSG00000169635 hypermethylated in cancer 2
ENSG00000220201 GATA like protein-1
ENSG00000183621 zinc finger protein 438
ENSG00000111249 cut-like homeobox 2
ENSG00000122733 KIAA1045
ENSG00000178498 deltex homolog 3 (Drosophila)
ENSG00000119403 PHD finger protein 19
ENSG00000161405 IKAROS family zinc finger 3 (Aiolos)
184 genes not classified

164
By excluding genes in the MHC region from the confirmed gene list, the number of
significant functional annotation categories reduces from 191 to 120 representing 87 genes
(Supplementary File 2). A large proportion (67%) of these annotation categories are from
the GO database and many (42%) are involved in regulation of immune system processes.
The number of significantly associated categories from the expanded gene list drops from
351 to 303 when genes in the MHC region are excluded and represents 193 genes
identified by the Taverna workflow (Supplementary File 4).

After exclusion of the MHC region genes from the confirmed gene list, the number of
functional annotation clusters was reduced slightly from 33 to 30 with scores ranging from
0.031 – 1.974 (Supplementary File 6) although the number of terms which could not be
clustered increased to 72 (60%). The most enriched cluster was comprised of 77 annotation
terms involved in regulation of immune system processes largely (74%) from the GO
database. This also represented the largest cluster of the 30 clusters identified. Analysis of
the expanded gene list excluding genes in the MHC resulted in the identification of 72
clusters with enrichment scores ranging from 0.021 – 3.529 (Supplementary File 8). The
most enriched cluster contains 4 annotation terms, all describing the SH2 domain, a
regulatory module of intracellular signalling cascades (ref uniprot).The fifth most enriched
cluster is comprised of 133 annotation terms and represents the largest cluster identified.
This cluster is similar to cluster 6 from the expanded gene list (Table 22) with many (80)
overlapping terms. The cluster is comprised of a large proportion (90%) of terms from the
GO database and 82% are involved in regulation of immune system processes.

7.1.1.3 InnateDB
InnateDB failed to identify between 41 and 70 genes and contained no pathway
associations for between 121 and 266 genes (Supplementary Table 12, Supplementary
Table 13 & Supplementary Table 14). Supplementary Table 15 shows all pathways
identified by InnateDB which remain significant (p < 0.05) after correction for multiple
testing from the confirmed list analysis. After exclusion of the MHC region genes from the
confirmed gene list, 77 were represented by 233 pathways (Supplementary File 10). The
number of pathways reduced to 62 using a p-value cut-off of p < 0.05 and 16 after
correction (p < 0.05) (Supplementary Table 16) which were mainly involved in immune
based signalling pathways. Supplementary Table 17 shows all pathways identified by
InnateDB which remain significant (p < 0.05) after correction for multiple testing from the
confirmed list analysis. Analysis of the expanded gene list after excluding genes from the
MHC region resulted in a reduction in the number of pathways identified from 534 to 462

165
containing 98 significant (p < 0.05) pathways with 38 remaining significant after correction
for multiple testing (Supplementary Table 18 & Supplementary File 12). Nearly two thirds
of the significant pathways are involved in signalling events/pathways with all but two
pathways (8084 & 5468) also being identified in the expanded gene list analysis. These
pathways were identified by the previous analysis but both just failed to reach significance
after Benjamini Hochberg correction (0.0502 & 0.0518 respectively).

Supplementary Table 12 InnateDB gene summary.


Gene List Unrecognised Genes With
Genes No Pathway
Confirmed 41 (17.45%) 138 (58.72%)
Confirmed no HLA 32 (16.16%) 121 (61.11%)
Expanded 70 (16.06%) 266 (61.01%)
Expanded no HLA 61 (15.29%) 249 (62.41%)

Supplementary Table 13 Ensembl Gene IDs unrecognised by InnateDB.


Confirmed Confirmed no HLA Expanded Expanded no HLA
ENSG00000240065 ENSG00000240416 ENSG00000239294 ENSG00000239294
ENSG00000240416 ENSG00000240771 ENSG00000239511 ENSG00000239511
ENSG00000240771 ENSG00000240825 ENSG00000240065 ENSG00000240416
ENSG00000240825 ENSG00000241102 ENSG00000240416 ENSG00000240634
ENSG00000241102 ENSG00000241524 ENSG00000240634 ENSG00000240771
ENSG00000241106 ENSG00000241573 ENSG00000240771 ENSG00000240825
ENSG00000241287 ENSG00000242162 ENSG00000240825 ENSG00000240906
ENSG00000241524 ENSG00000242241 ENSG00000240906 ENSG00000241037
ENSG00000241573 ENSG00000242309 ENSG00000241037 ENSG00000241102
ENSG00000242162 ENSG00000242990 ENSG00000241102 ENSG00000241524
ENSG00000242241 ENSG00000243384 ENSG00000241106 ENSG00000241573
ENSG00000242309 ENSG00000243592 ENSG00000241287 ENSG00000242162
ENSG00000242574 ENSG00000243738 ENSG00000241524 ENSG00000242241
ENSG00000242990 ENSG00000244161 ENSG00000241573 ENSG00000242309
ENSG00000243054 ENSG00000244383 ENSG00000242162 ENSG00000242591
ENSG00000243384 ENSG00000244498 ENSG00000242241 ENSG00000242990
ENSG00000243592 ENSG00000244556 ENSG00000242309 ENSG00000243233
ENSG00000243738 ENSG00000245247 ENSG00000242574 ENSG00000243384
ENSG00000244161 ENSG00000247039 ENSG00000242591 ENSG00000243592
ENSG00000244383 ENSG00000248203 ENSG00000242990 ENSG00000243738
ENSG00000244498 ENSG00000248224 ENSG00000243054 ENSG00000244060
ENSG00000244556 ENSG00000248452 ENSG00000243233 ENSG00000244161
ENSG00000245247 ENSG00000248549 ENSG00000243384 ENSG00000244383
ENSG00000247039 ENSG00000249141 ENSG00000243592 ENSG00000244498
ENSG00000247909 ENSG00000251922 ENSG00000243738 ENSG00000244556
ENSG00000248203 ENSG00000252865 ENSG00000244060 ENSG00000244671
ENSG00000248224 ENSG00000253032 ENSG00000244161 ENSG00000245192
ENSG00000248452 ENSG00000254371 ENSG00000244383 ENSG00000245247
ENSG00000248549 ENSG00000254774 ENSG00000244498 ENSG00000245682
ENSG00000248993 ENSG00000255154 ENSG00000244556 ENSG00000246465
ENSG00000249141 ENSG00000255354 ENSG00000244671 ENSG00000247039
ENSG00000250264 ENSG00000255518 ENSG00000245192 ENSG00000248203
ENSG00000251916 ENSG00000245247 ENSG00000248224
ENSG00000251922 ENSG00000245682 ENSG00000248452
ENSG00000252865 ENSG00000246465 ENSG00000248549
ENSG00000253032 ENSG00000247039 ENSG00000249141
ENSG00000254371 ENSG00000247909 ENSG00000249524
ENSG00000254774 ENSG00000248203 ENSG00000249860
ENSG00000255154 ENSG00000248224 ENSG00000251417
ENSG00000255354 ENSG00000248452 ENSG00000251922
ENSG00000255518 ENSG00000248549 ENSG00000252020
ENSG00000248993 ENSG00000252142
ENSG00000249141 ENSG00000252143
ENSG00000249524 ENSG00000252314
ENSG00000249860 ENSG00000252402
ENSG00000250264 ENSG00000252412

166
ENSG00000251417 ENSG00000252461
ENSG00000251916 ENSG00000252605
ENSG00000251922 ENSG00000252619
ENSG00000252020 ENSG00000252799
ENSG00000252142 ENSG00000252865
ENSG00000252143 ENSG00000252976
ENSG00000252314 ENSG00000253032
ENSG00000252402 ENSG00000253080
ENSG00000252412 ENSG00000254371
ENSG00000252461 ENSG00000254498
ENSG00000252605 ENSG00000254774
ENSG00000252619 ENSG00000255154
ENSG00000252799 ENSG00000255354
ENSG00000252865 ENSG00000255518
ENSG00000252976 ENSG00000255524
ENSG00000253032
ENSG00000253080
ENSG00000254371
ENSG00000254498
ENSG00000254774
ENSG00000255154
ENSG00000255354
ENSG00000255518
ENSG00000255524

167
Supplementary Table 14 Ensembl Gene IDs with no InnateDB pathways.
Confirmed Confirmed no HLA Expanded Expanded no HLA
Ensembl Gene ID Gene Name Ensembl Gene ID Gene Name Ensembl Gene ID Gene Name Ensembl Gene ID Gene Name
ENSG00000026297 RNASET2 ENSG00000026297 RNASET2 ENSG00000026297 RNASET2 ENSG00000026297 RNASET2
ENSG00000064419 TNPO3 ENSG00000064419 TNPO3 ENSG00000064419 TNPO3 ENSG00000064419 TNPO3
ENSG00000081019 RSBN1 ENSG00000081019 RSBN1 ENSG00000073605 GSDMB ENSG00000073605 GSDMB
ENSG00000115464 USP34 ENSG00000115464 USP34 ENSG00000081019 RSBN1 ENSG00000081019 RSBN1
ENSG00000116774 OLFML3 ENSG00000116774 OLFML3 ENSG00000089234 BRAP ENSG00000089234 BRAP
ENSG00000116793 PHTF1 ENSG00000116793 PHTF1 ENSG00000089248 ERP29 ENSG00000089248 ERP29
ENSG00000118655 DCLRE1B ENSG00000118655 DCLRE1B ENSG00000100027 YPEL1 ENSG00000100027 YPEL1
ENSG00000119396 RAB14 ENSG00000119396 RAB14 ENSG00000100379 KCTD17 ENSG00000100379 KCTD17
ENSG00000119403 PHF19 ENSG00000119403 PHF19 ENSG00000105364 MRPL4 ENSG00000105364 MRPL4
ENSG00000120436 GPR31 ENSG00000120436 GPR31 ENSG00000105371 ICAM4 ENSG00000105371 ICAM4
ENSG00000122733 KIAA1045 ENSG00000122733 KIAA1045 ENSG00000105376 ICAM5 ENSG00000105376 ICAM5
ENSG00000124160 NCOA5 ENSG00000124160 NCOA5 ENSG00000111249 CUX2 ENSG00000111249 CUX2
ENSG00000130363 RSPH3 ENSG00000130363 RSPH3 ENSG00000111271 ACAD10 ENSG00000111271 ACAD10
ENSG00000135253 KCP ENSG00000135253 KCP ENSG00000111300 C12orf30 ENSG00000111300 C12orf30
ENSG00000138688 KIAA1109 ENSG00000138688 KIAA1109 ENSG00000115464 USP34 ENSG00000115464 USP34
ENSG00000143061 IGSF3 ENSG00000143061 IGSF3 ENSG00000116774 OLFML3 ENSG00000116774 OLFML3
ENSG00000144218 AFF3 ENSG00000144218 AFF3 ENSG00000116793 PHTF1 ENSG00000116793 PHTF1
168

ENSG00000145723 GIN1 ENSG00000145723 GIN1 ENSG00000118655 DCLRE1B ENSG00000118655 DCLRE1B


ENSG00000145725 HISPPD1 ENSG00000145725 HISPPD1 ENSG00000119396 RAB14 ENSG00000119396 RAB14
ENSG00000145730 PAM ENSG00000145730 PAM ENSG00000119403 PHF19 ENSG00000119403 PHF19
ENSG00000154319 FAM167A ENSG00000154319 FAM167A ENSG00000120436 GPR31 ENSG00000120436 GPR31
ENSG00000155980 KIF5A ENSG00000155980 KIF5A ENSG00000122733 KIAA1045 ENSG00000122733 KIAA1045
ENSG00000158457 TSPAN33 ENSG00000158457 TSPAN33 ENSG00000124160 NCOA5 ENSG00000124160 NCOA5
ENSG00000162927 PUS10 ENSG00000162927 PUS10 ENSG00000128228 SDF2L1 ENSG00000128228 SDF2L1
ENSG00000162929 KIAA1841 ENSG00000162929 KIAA1841 ENSG00000128311 TST ENSG00000128311 TST
ENSG00000163349 HIPK1 ENSG00000163349 HIPK1 ENSG00000130363 RSPH3 ENSG00000130363 RSPH3
ENSG00000163684 RPP14 ENSG00000163684 RPP14 ENSG00000131748 STARD3 ENSG00000131748 STARD3
ENSG00000163686 ABHD6 ENSG00000163686 ABHD6 ENSG00000133475 GGT2 ENSG00000133475 GGT2
ENSG00000163687 DNASE1L3 ENSG00000163687 DNASE1L3 ENSG00000135148 TRAFD1 ENSG00000135148 TRAFD1
ENSG00000164113 ADAD1 ENSG00000164113 ADAD1 ENSG00000135253 KCP ENSG00000135253 KCP
ENSG00000164512 ANKRD55 ENSG00000164512 ANKRD55 ENSG00000135362 PRR5L ENSG00000135362 PRR5L
ENSG00000164691 TAGAP ENSG00000164691 TAGAP ENSG00000138688 KIAA1109 ENSG00000138688 KIAA1109
ENSG00000166984 TCP10L ENSG00000166984 TCP10L ENSG00000141741 C17orf37 ENSG00000141741 C17orf37
ENSG00000166987 MBD6 ENSG00000166987 MBD6 ENSG00000143061 IGSF3 ENSG00000143061 IGSF3
ENSG00000168297 PXK ENSG00000168297 PXK ENSG00000144218 AFF3 ENSG00000144218 AFF3
ENSG00000168301 KCTD6 ENSG00000168301 KCTD6 ENSG00000145723 GIN1 ENSG00000145723 GIN1
ENSG00000168309 FAM107A ENSG00000168309 FAM107A ENSG00000145725 HISPPD1 ENSG00000145725 HISPPD1
ENSG00000170500 LONRF2 ENSG00000170500 LONRF2 ENSG00000145730 PAM ENSG00000145730 PAM
ENSG00000170983 C8orf14 ENSG00000170983 C8orf14 ENSG00000154319 FAM167A ENSG00000154319 FAM167A
ENSG00000173209 AHSA2 ENSG00000173209 AHSA2 ENSG00000154822 PLCL2 ENSG00000154822 PLCL2
ENSG00000175749 IDBG-35902 ENSG00000175749 IDBG-35902 ENSG00000155980 KIF5A ENSG00000155980 KIF5A
ENSG00000178723 IDBG-119582 ENSG00000178723 IDBG-119582 ENSG00000156127 BATF ENSG00000156127 BATF
ENSG00000181751 C5orf30 ENSG00000181751 C5orf30 ENSG00000158457 TSPAN33 ENSG00000158457 TSPAN33
ENSG00000184608 C8orf12 ENSG00000184608 C8orf12 ENSG00000160183 TMPRSS3 ENSG00000160183 TMPRSS3
ENSG00000187791 IDBG-61196 ENSG00000187791 IDBG-61196 ENSG00000160185 UBASH3A ENSG00000160185 UBASH3A
ENSG00000188761 BCL2L15 ENSG00000188761 BCL2L15 ENSG00000160716 CHRNB2 ENSG00000160716 CHRNB2
ENSG00000196301 HLA-DRB9 ENSG00000197146 IDBG-119344 ENSG00000161179 YDJC ENSG00000161179 YDJC
ENSG00000197146 IDBG-119344 ENSG00000198643 FAM3D ENSG00000161180 CCDC116 ENSG00000161180 CCDC116
ENSG00000198643 FAM3D ENSG00000201076 IDBG-120406 ENSG00000161395 PGAP3 ENSG00000161395 PGAP3
ENSG00000201076 IDBG-120406 ENSG00000201581 IDBG-111663 ENSG00000161847 RAVER1 ENSG00000161847 RAVER1
ENSG00000201581 IDBG-111663 ENSG00000203711 C6orf99; ENSG00000162927 PUS10 ENSG00000162927 PUS10
LOC100130967
ENSG00000203711 C6orf99 ENSG00000203864 C1orf137 ENSG00000162929 KIAA1841 ENSG00000162929 KIAA1841
LOC100130967
ENSG00000203864 C1orf137 ENSG00000204929 IDBG-54329 ENSG00000163239 TDRD10 ENSG00000163239 TDRD10
ENSG00000204290 BTNL2 ENSG00000205108 IDBG-61121 ENSG00000163349 HIPK1 ENSG00000163349 HIPK1
ENSG00000204296 C6orf10 ENSG00000206937 SNORA70B ENSG00000163684 RPP14 ENSG00000163684 RPP14
ENSG00000204929 IDBG-54329 ENSG00000206970 IDBG-110009 ENSG00000163686 ABHD6 ENSG00000163686 ABHD6
169

ENSG00000205108 IDBG-61121 ENSG00000206973 IDBG-111587 ENSG00000163687 DNASE1L3 ENSG00000163687 DNASE1L3


ENSG00000206937 SNORA70B ENSG00000207402 IDBG-111731 ENSG00000164113 ADAD1 ENSG00000164113 ADAD1
ENSG00000206970 IDBG-110009 ENSG00000208028 hsa-mir-616 ENSG00000164512 ANKRD55 ENSG00000164512 ANKRD55
ENSG00000206973 IDBG-111587 ENSG00000211543 hsa-mir-320b-1 ENSG00000164691 TAGAP ENSG00000164691 TAGAP
ENSG00000207402 IDBG-111731 ENSG00000211573 IDBG-126479 ENSG00000166352 C11orf74 ENSG00000166352 C11orf74
ENSG00000208028 hsa-mir-616 ENSG00000212978 IDBG-247279 ENSG00000166984 TCP10L ENSG00000166984 TCP10L
ENSG00000211543 hsa-mir-320b-1 ENSG00000213076 IDBG-246193 ENSG00000166987 MBD6 ENSG00000166987 MBD6
ENSG00000211573 IDBG-126479 ENSG00000213820 RPL13P2 ENSG00000167807 FDX1L ENSG00000167807 FDX1L
ENSG00000212066 IDBG-126761 ENSG00000213925 IDBG-236637 ENSG00000167914 GSDMA ENSG00000167914 GSDMA
ENSG00000212978 IDBG-247279 ENSG00000214015 IDBG-235806 ENSG00000168297 PXK ENSG00000168297 PXK
ENSG00000213076 IDBG-246193 ENSG00000215199 IDBG-229847 ENSG00000168301 KCTD6 ENSG00000168301 KCTD6
ENSG00000213820 RPL13P2 ENSG00000215204 C9orf144 ENSG00000168309 FAM107A ENSG00000168309 FAM107A
ENSG00000213925 IDBG-236637 ENSG00000219463 IDBG-276297 ENSG00000168488 ATXN2L ENSG00000168488 ATXN2L
ENSG00000214015 IDBG-235806 ENSG00000221401 IDBG-280794 ENSG00000169291 SHE ENSG00000169291 SHE
ENSG00000214861 IDBG-231410 ENSG00000222251 IDBG-280765 ENSG00000169635 HIC2 ENSG00000169635 HIC2
ENSG00000215199 IDBG-229847 ENSG00000223003 IDBG-281876 ENSG00000169662 LOC728468 ENSG00000169662 LOC728468
ENSG00000215204 C9orf144 ENSG00000223191 IDBG-282190 ENSG00000169668 IDBG-1947 ENSG00000169668 IDBG-1947
ENSG00000219463 IDBG-276297 ENSG00000223489 NEFHL ENSG00000169682 SPNS1 ENSG00000169682 SPNS1
ENSG00000221401 IDBG-280794 ENSG00000223895 IDBG-299401 ENSG00000170500 LONRF2 ENSG00000170500 LONRF2
ENSG00000222251 IDBG-280765 ENSG00000224163 IDBG-308875 ENSG00000170983 C8orf14 ENSG00000170983 C8orf14
ENSG00000223003 IDBG-281876 ENSG00000224478 IDBG-310091 ENSG00000171532 NEUROD2 ENSG00000171532 NEUROD2
ENSG00000223191 IDBG-282190 ENSG00000224713 IDBG-305690 ENSG00000172057 ORMDL3 ENSG00000172057 ORMDL3
ENSG00000223335 IDBG-282431 ENSG00000224791 IDBG-310319 ENSG00000173064 C12orf51 ENSG00000173064 C12orf51
ENSG00000223489 NEFHL ENSG00000224890 IDBG-305921 ENSG00000173209 AHSA2 ENSG00000173209 AHSA2
ENSG00000223534 IDBG-303854 ENSG00000226004 IDBG-309676 ENSG00000175749 IDBG-35902 ENSG00000175749 IDBG-35902
ENSG00000223895 IDBG-299401 ENSG00000226032 IDBG-310090 ENSG00000176046 NUPR1 ENSG00000176046 NUPR1
ENSG00000224163 IDBG-308875 ENSG00000226167 IDBG-309888 ENSG00000176476 CCDC101 ENSG00000176476 CCDC101
ENSG00000224478 IDBG-310091 ENSG00000226884 RPS29P10 ENSG00000176953 NFATC2IP ENSG00000176953 NFATC2IP
ENSG00000224713 IDBG-305690 ENSG00000227145 IDBG-305107 ENSG00000177548 RABEP2 ENSG00000177548 RABEP2
ENSG00000224791 IDBG-310319 ENSG00000227598 IDBG-310220 ENSG00000178723 IDBG-119582 ENSG00000178723 IDBG-119582
ENSG00000224890 IDBG-305921 ENSG00000227943 IDBG-304897 ENSG00000178952 TUFM ENSG00000178952 TUFM
ENSG00000225914 IDBG-303741 ENSG00000228414 IDBG-304879 ENSG00000181751 C5orf30 ENSG00000181751 C5orf30
ENSG00000226004 IDBG-309676 ENSG00000229260 IDBG-299123 ENSG00000183246 RIMBP3C ENSG00000183246 RIMBP3C
ENSG00000226030 HLA-DQB3 ENSG00000229664 IDBG-299033 ENSG00000183506 PI4KAP2 ENSG00000183506 PI4KAP2
ENSG00000226032 IDBG-310090 ENSG00000229922 IDBG-309677 ENSG00000183621 ZNF438 ENSG00000183621 ZNF438
ENSG00000226167 IDBG-309888 ENSG00000230359 IDBG-308889 ENSG00000184608 C8orf12 ENSG00000184608 C8orf12
ENSG00000226884 RPS29P10 ENSG00000230393 IDBG-307594 ENSG00000184730 IDBG-22787 ENSG00000184730 IDBG-22787
ENSG00000227145 IDBG-305107 ENSG00000230533 IDBG-309664 ENSG00000185264 C22orf33 ENSG00000185264 C22orf33
ENSG00000227598 IDBG-310220 ENSG00000230626 IDBG-308890 ENSG00000186075 ZPBP2 ENSG00000186075 ZPBP2
ENSG00000227943 IDBG-304897 ENSG00000231072 IDBG-309929 ENSG00000187045 TMPRSS6 ENSG00000187045 TMPRSS6
ENSG00000228414 IDBG-304879 ENSG00000231128 IDBG-309883 ENSG00000187791 IDBG-61196 ENSG00000187791 IDBG-61196
170

ENSG00000228962 HCG23 ENSG00000231634 IDBG-309894 ENSG00000188322 SBK1 ENSG00000188322 SBK1


ENSG00000229260 IDBG-299123 ENSG00000231858 IDBG-310053 ENSG00000188761 BCL2L15 ENSG00000188761 BCL2L15
ENSG00000229391 HLA-DRB6 ENSG00000232067 IDBG-300777 ENSG00000196301 HLA-DRB9 ENSG00000196347 CDC37P1
ENSG00000229664 IDBG-299033 ENSG00000232084 IDBG-307595 ENSG00000196347 CDC37P1 ENSG00000196585 IDBG-22359
ENSG00000229922 IDBG-309677 ENSG00000232450 IDBG-309882 ENSG00000196585 IDBG-22359 ENSG00000196934 RIMBP3B
ENSG00000230359 IDBG-308889 ENSG00000232547 IDBG-299125 ENSG00000196934 RIMBP3B ENSG00000196993 IDBG-23178
ENSG00000230393 IDBG-307594 ENSG00000232693 IDBG-305322 ENSG00000196993 IDBG-23178 ENSG00000197146 IDBG-119344
ENSG00000230533 IDBG-309664 ENSG00000232713 IDBG-304884 ENSG00000197146 IDBG-119344 ENSG00000197210 POM121L7
ENSG00000230626 IDBG-308890 ENSG00000233031 IDBG-310318 ENSG00000197210 POM121L7 ENSG00000198156 NPIPL1
ENSG00000231072 IDBG-309929 ENSG00000233459 IDBG-310317 ENSG00000198156 NPIPL1 ENSG00000198270 TMEM116
ENSG00000231128 IDBG-309883 ENSG00000234255 IDBG-305328 ENSG00000198270 TMEM116 ENSG00000198324 FAM109A
ENSG00000231634 IDBG-309894 ENSG00000234624 IDBG-305069 ENSG00000198324 FAM109A ENSG00000198567 IDBG-115755
ENSG00000231858 IDBG-310053 ENSG00000235527 IDBG-309890 ENSG00000198567 IDBG-115755 ENSG00000198643 FAM3D
ENSG00000232067 IDBG-300777 ENSG00000235842 IDBG-309670 ENSG00000198643 FAM3D ENSG00000200057 IDBG-121490
ENSG00000232080 IDBG-303860 ENSG00000236722 IDBG-306431 ENSG00000200057 IDBG-121490 ENSG00000200135 IDBG-114419
ENSG00000232084 IDBG-307595 ENSG00000237024 IDBG-304880 ENSG00000200135 IDBG-114419 ENSG00000200688 IDBG-110209
ENSG00000232450 IDBG-309882 ENSG00000237499 IDBG-309672 ENSG00000200688 IDBG-110209 ENSG00000201076 IDBG-120406
ENSG00000232547 IDBG-299125 ENSG00000237522 IDBG-304883 ENSG00000201076 IDBG-120406 ENSG00000201078 IDBG-111995
ENSG00000232629 HLA-DQB2 ENSG00000237651 IDBG-304952 ENSG00000201078 IDBG-111995 ENSG00000201428 IDBG-108913
ENSG00000232693 IDBG-305322 ENSG00000237969 IDBG-305658 ENSG00000201428 IDBG-108913 ENSG00000201466 IDBG-121090
ENSG00000232713 IDBG-304884 ENSG00000238256 IDBG-309876 ENSG00000201466 IDBG-121090 ENSG00000201581 IDBG-111663
ENSG00000233031 IDBG-310318 ENSG00000238436 IDBG-311853 ENSG00000201581 IDBG-111663 ENSG00000203711 C6orf99
LOC100130967
ENSG00000233459 IDBG-310317 ENSG00000238532 IDBG-311510 ENSG00000203711 C6orf99 ENSG00000203864 C1orf137
LOC100130967
ENSG00000234255 IDBG-305328 ENSG00000238733 IDBG-311943 ENSG00000203864 C1orf137 ENSG00000204842 ATXN2
ENSG00000234515 PPP1R2P1 ENSG00000204290 BTNL2 ENSG00000204913 IDBG-46746
ENSG00000234624 IDBG-305069 ENSG00000204296 C6orf10 ENSG00000204929 IDBG-54329
ENSG00000235040 MTCO3P1 ENSG00000204842 ATXN2 ENSG00000205108 IDBG-61121
ENSG00000235301 HLA-Z ENSG00000204913 IDBG-46746 ENSG00000205609 EIF3CL
ENSG00000235527 IDBG-309890 ENSG00000204929 IDBG-54329 ENSG00000206140 IDBG-2026
ENSG00000235842 IDBG-309670 ENSG00000205108 IDBG-61121 ENSG00000206142 IDBG-2002
ENSG00000236722 IDBG-306431 ENSG00000205609 EIF3CL ENSG00000206763 IDBG-120770
ENSG00000237024 IDBG-304880 ENSG00000206140 IDBG-2026 ENSG00000206937 SNORA70B
ENSG00000237285 HNRPA1P2 ENSG00000206142 IDBG-2002 ENSG00000206970 IDBG-110009
ENSG00000237499 IDBG-309672 ENSG00000206763 IDBG-120770 ENSG00000206973 IDBG-111587
ENSG00000237522 IDBG-304883 ENSG00000206937 SNORA70B ENSG00000207402 IDBG-111731
ENSG00000237651 IDBG-304952 ENSG00000206970 IDBG-110009 ENSG00000207751 hsa-mir-130b
ENSG00000237969 IDBG-305658 ENSG00000206973 IDBG-111587 ENSG00000207759 hsa-mir-181a-1
ENSG00000238256 IDBG-309876 ENSG00000207402 IDBG-111731 ENSG00000207975 hsa-mir-181b-1
ENSG00000238436 IDBG-311853 ENSG00000207751 hsa-mir-130b ENSG00000208028 hsa-mir-616
171

ENSG00000238532 IDBG-311510 ENSG00000207759 hsa-mir-181a-1 ENSG00000211543 hsa-mir-320b-1


ENSG00000238733 IDBG-311943 ENSG00000207975 hsa-mir-181b-1 ENSG00000211573 IDBG-126479
ENSG00000208028 hsa-mir-616 ENSG00000212102 hsa-mir-301b
ENSG00000211543 hsa-mir-320b-1 ENSG00000212743 DKFZp667F0711
ENSG00000211573 IDBG-126479 ENSG00000212978 IDBG-247279
ENSG00000212066 IDBG-126761 ENSG00000213076 IDBG-246193
ENSG00000212102 hsa-mir-301b ENSG00000213152 IDBG-245319
ENSG00000212743 DKFZp667F0711 ENSG00000213156 IDBG-245249
ENSG00000212978 IDBG-247279 ENSG00000213820 RPL13P2
ENSG00000213076 IDBG-246193 ENSG00000213925 IDBG-236637
ENSG00000213152 IDBG-245319 ENSG00000214015 IDBG-235806
ENSG00000213156 IDBG-245249 ENSG00000214546 IDBG-232830
ENSG00000213820 RPL13P2 ENSG00000215199 IDBG-229847
ENSG00000213925 IDBG-236637 ENSG00000215204 C9orf144
ENSG00000214015 IDBG-235806 ENSG00000215403 IDBG-228764
ENSG00000214546 IDBG-232830 ENSG00000215498 IDBG-228209
ENSG00000214861 IDBG-231410 ENSG00000219463 IDBG-276297
ENSG00000215199 IDBG-229847 ENSG00000220201 GLP-1
ENSG00000215204 C9orf144 ENSG00000221386 IDBG-281357
ENSG00000215403 IDBG-228764 ENSG00000221401 IDBG-280794
ENSG00000215498 IDBG-228209 ENSG00000221566 hsa-mir-1181
ENSG00000219463 IDBG-276297 ENSG00000222251 IDBG-280765
ENSG00000220201 GLP-1 ENSG00000222352 IDBG-280927
ENSG00000221386 IDBG-281357 ENSG00000223003 IDBG-281876
ENSG00000221401 IDBG-280794 ENSG00000223191 IDBG-282190
ENSG00000221566 hsa-mir-1181 ENSG00000223489 NEFHL
ENSG00000222251 IDBG-280765 ENSG00000223881 IDBG-310976
ENSG00000222352 IDBG-280927 ENSG00000223895 IDBG-299401
ENSG00000223003 IDBG-281876 ENSG00000224163 IDBG-308875
ENSG00000223191 IDBG-282190 ENSG00000224478 IDBG-310091
ENSG00000223335 IDBG-282431 ENSG00000224688 IDBG-299289
ENSG00000223489 NEFHL ENSG00000224713 IDBG-305690
ENSG00000223534 IDBG-303854 ENSG00000224791 IDBG-310319
ENSG00000223881 IDBG-310976 ENSG00000224890 IDBG-305921
ENSG00000223895 IDBG-299401 ENSG00000225172 IDBG-310981
ENSG00000224163 IDBG-308875 ENSG00000226004 IDBG-309676
ENSG00000224478 IDBG-310091 ENSG00000226032 IDBG-310090
ENSG00000224688 IDBG-299289 ENSG00000226117 IDBG-305109
ENSG00000224713 IDBG-305690 ENSG00000226167 IDBG-309888
ENSG00000224791 IDBG-310319 ENSG00000226441 IDBG-300402
ENSG00000224890 IDBG-305921 ENSG00000226469 IDBG-307555
172

ENSG00000225172 IDBG-310981 ENSG00000226534 IDBG-299382


ENSG00000225914 IDBG-303741 ENSG00000226728 IDBG-302605
ENSG00000226004 IDBG-309676 ENSG00000226884 RPS29P10
ENSG00000226030 HLA-DQB3 ENSG00000226885 IDBG-299356
ENSG00000226032 IDBG-310090 ENSG00000227145 IDBG-305107
ENSG00000226117 IDBG-305109 ENSG00000227598 IDBG-310220
ENSG00000226167 IDBG-309888 ENSG00000227747 IDBG-310980
ENSG00000226441 IDBG-300402 ENSG00000227943 IDBG-304897
ENSG00000226469 IDBG-307555 ENSG00000228013 IDBG-310430
ENSG00000226534 IDBG-299382 ENSG00000228264 IDBG-310431
ENSG00000226728 IDBG-302605 ENSG00000228414 IDBG-304879
ENSG00000226884 RPS29P10 ENSG00000228910 IDBG-299361
ENSG00000226885 IDBG-299356 ENSG00000229186 ADAM1
ENSG00000227145 IDBG-305107 ENSG00000229260 IDBG-299123
ENSG00000227598 IDBG-310220 ENSG00000229266 POM121L8P
ENSG00000227747 IDBG-310980 ENSG00000229664 IDBG-299033
ENSG00000227943 IDBG-304897 ENSG00000229780 IDBG-310432
ENSG00000228013 IDBG-310430 ENSG00000229922 IDBG-309677
ENSG00000228264 IDBG-310431 ENSG00000229933 IDBG-310977
ENSG00000228414 IDBG-304879 ENSG00000229989 IDBG-310979
ENSG00000228910 IDBG-299361 ENSG00000230359 IDBG-308889
ENSG00000228962 HCG23 ENSG00000230393 IDBG-307594
ENSG00000229186 ADAM1 ENSG00000230533 IDBG-309664
ENSG00000229260 IDBG-299123 ENSG00000230626 IDBG-308890
ENSG00000229266 POM121L8P ENSG00000231072 IDBG-309929
ENSG00000229391 HLA-DRB6 ENSG00000231128 IDBG-309883
ENSG00000229664 IDBG-299033 ENSG00000231467 IDBG-304073
ENSG00000229780 IDBG-310432 ENSG00000231634 IDBG-309894
ENSG00000229922 IDBG-309677 ENSG00000231718 IDBG-310984
ENSG00000229933 IDBG-310977 ENSG00000231858 IDBG-310053
ENSG00000229989 IDBG-310979 ENSG00000232067 IDBG-300777
ENSG00000230359 IDBG-308889 ENSG00000232084 IDBG-307595
ENSG00000230393 IDBG-307594 ENSG00000232450 IDBG-309882
ENSG00000230533 IDBG-309664 ENSG00000232547 IDBG-299125
ENSG00000230626 IDBG-308890 ENSG00000232693 IDBG-305322
ENSG00000231072 IDBG-309929 ENSG00000232713 IDBG-304884
ENSG00000231128 IDBG-309883 ENSG00000232771 IDBG-299381
ENSG00000231467 IDBG-304073 ENSG00000233031 IDBG-310318
ENSG00000231634 IDBG-309894 ENSG00000233232 IDBG-303430
ENSG00000231718 IDBG-310984 ENSG00000233410 IDBG-310982
ENSG00000231858 IDBG-310053 ENSG00000233411 IDBG-310726
173

ENSG00000232067 IDBG-300777 ENSG00000233459 IDBG-310317


ENSG00000232080 IDBG-303860 ENSG00000233875 IDBG-310433
ENSG00000232084 IDBG-307595 ENSG00000234255 IDBG-305328
ENSG00000232450 IDBG-309882 ENSG00000234503 IDBG-299312
ENSG00000232547 IDBG-299125 ENSG00000234545 FAM133B
ENSG00000232629 HLA-DQB2 ENSG00000234608 IDBG-307550
ENSG00000232693 IDBG-305322 ENSG00000234624 IDBG-305069
ENSG00000232713 IDBG-304884 ENSG00000235237 IDBG-304088
ENSG00000232771 IDBG-299381 ENSG00000235492 IDBG-310983
ENSG00000233031 IDBG-310318 ENSG00000235527 IDBG-309890
ENSG00000233232 IDBG-303430 ENSG00000235842 IDBG-309670
ENSG00000233410 IDBG-310982 ENSG00000236278 IDBG-310978
ENSG00000233411 IDBG-310726 ENSG00000236722 IDBG-306431
ENSG00000233459 IDBG-310317 ENSG00000237024 IDBG-304880
ENSG00000233875 IDBG-310433 ENSG00000237135 IDBG-302637
ENSG00000234255 IDBG-305328 ENSG00000237407 IDBG-299378
ENSG00000234503 IDBG-299312 ENSG00000237499 IDBG-309672
ENSG00000234515 PPP1R2P1 ENSG00000237522 IDBG-304883
ENSG00000234545 FAM133B ENSG00000237651 IDBG-304952
ENSG00000234608 IDBG-307550 ENSG00000237819 IDBG-306955
ENSG00000234624 IDBG-305069 ENSG00000237868 IDBG-304904
ENSG00000235040 MTCO3P1 ENSG00000237969 IDBG-305658
ENSG00000235237 IDBG-304088 ENSG00000238168 IDBG-307531
ENSG00000235301 HLA-Z ENSG00000238256 IDBG-309876
ENSG00000235492 IDBG-310983 ENSG00000238352 IDBG-311995
ENSG00000235527 IDBG-309890 ENSG00000238436 IDBG-311853
ENSG00000235842 IDBG-309670 ENSG00000238532 IDBG-311510
ENSG00000236278 IDBG-310978 ENSG00000238684 IDBG-311994
ENSG00000236722 IDBG-306431 ENSG00000238703 IDBG-311993
ENSG00000237024 IDBG-304880 ENSG00000238733 IDBG-311943
ENSG00000237135 IDBG-302637 ENSG00000239071 IDBG-312234
ENSG00000237285 HNRPA1P2
ENSG00000237407 IDBG-299378
ENSG00000237499 IDBG-309672
ENSG00000237522 IDBG-304883
ENSG00000237651 IDBG-304952
ENSG00000237819 IDBG-306955
ENSG00000237868 IDBG-304904
ENSG00000237969 IDBG-305658
ENSG00000238168 IDBG-307531
ENSG00000238256 IDBG-309876
174

ENSG00000238352 IDBG-311995
ENSG00000238436 IDBG-311853
ENSG00000238532 IDBG-311510
ENSG00000238684 IDBG-311994
ENSG00000238703 IDBG-311993
ENSG00000238733 IDBG-311943
ENSG00000239071 IDBG-312234

Supplementary Table 15 Significant pathways identified by the InnateDB pathway analysis of the confirmed gene list. Pathways are shown which remain significant (p < 0.05) after
correction for multiple testing.
Pathway Name Pathway Id Source Name Pathway uploaded Genes in InnateDB Genes Pathway Pathway p-value
gene count for this entity Ratio p-value (corrected)
Intestinal immune network for IgA production 8118 KEGG 11 45 24% 2.01E-14 6.23E-12
Allograft rejection 2793 KEGG 10 34 29% 4.26E-14 6.61E-12
Autoimmune thyroid disease 2799 KEGG 11 49 22% 5.64E-14 5.83E-12
Graft-versus-host disease 2807 KEGG 9 35 26% 3.44E-12 2.67E-10
Type I diabetes mellitus 525 KEGG 9 40 23% 1.29E-11 8.02E-10
Asthma 2818 KEGG 8 27 30% 1.68E-11 8.65E-10
Antigen processing and presentation 4144 PID BIOCARTA 6 12 50% 1.57E-10 6.94E-09
Cell adhesion molecules (CAMs) 440 KEGG 12 128 9% 1.86E-10 7.19E-09
Viral myocarditis 8123 KEGG 9 67 13% 1.72E-09 5.93E-08
Systemic lupus erythematosus 2805 KEGG 10 95 11% 2.29E-09 7.11E-08
Antigen processing and presentation 493 KEGG 8 74 11% 8.57E-08 2.42E-06
The co-stimulatory signal during t-cell activation 4018 PID BIOCARTA 5 18 28% 1.95E-07 5.05E-06
IL12 signaling mediated by STAT4 8013 PID NCI 5 29 17% 2.54E-06 6.05E-05
CD4 T cell receptor signaling (Vav, Rac and JNK cascade) 276 INOH 5 45 11% 2.37E-05 0.000524988
T cell receptor signaling (IKK-NF-kappaB cascade) 10 INOH 5 54 9% 5.82E-05 0.001202152
Primary immunodeficiency 2815 KEGG 4 35 11% 0.00014786 0.002864721
T cell receptor signaling (ERK cascade) 362 INOH 4 37 11% 0.00018434 0.003361562
T cell receptor signaling pathway 354 INOH 5 83 6% 0.00044944 0.007740429
Hematopoietic cell lineage 415 KEGG 5 85 6% 0.00050176 0.008186641
IL12-mediated signaling events 8015 PID NCI 4 59 7% 0.00111799 0.017328887
Chemokine receptors bind chemokines 3445 REACTOME 3 27 11% 0.00116702 0.017227382
CD40/CD40L signaling 8009 PID NCI 3 30 10% 0.00159315 0.022448869
Cd40l signaling pathway 4093 PID BIOCARTA 2 9 22% 0.0020963 0.028254532
Lck and fyn tyrosine kinases in initiation of tcr activation 4116 PID BIOCARTA 2 11 18% 0.00317062 0.040953817
G2/M Transition 1652 REACTOME 4 82 5% 0.00376105 0.046636979
Tnfr2 signaling pathway 4004 PID BIOCARTA 2 12 17% 0.00378566 0.045136723
Costimulation by the CD28 family 5974 REACTOME 3 41 7% 0.00393604 0.045191601

Supplementary Table 16 Significant pathways identified by the InnateDB pathway analysis of the confirmed gene list after exclusion of MHC region genes. Pathways are shown which
remain significant (p < 0.05) after correction for multiple testing.
175

Pathway Name Pathway Source Name Pathway uploaded gene Genes in InnateDB for this Genes Pathway Pathway p-value
Id count entity Ratio p-value (corrected)
IL12 signaling mediated by STAT4 8013 PID NCI 4 29 14% 2.91E-05 0.006770213
Intestinal immune network for IgA production 8118 KEGG 4 45 9% 0.000169369 0.019731491
The co-stimulatory signal during t-cell activation 4018 PID BIOCARTA 3 18 17% 0.000178264 0.01384515
Autoimmune thyroid disease 2799 KEGG 4 49 8% 0.00023647 0.013774356
Chemokine receptors bind chemokines 3445 REACTOME 3 27 11% 0.000614166 0.028620114
CD40/CD40L signaling 8009 PID NCI 3 30 10% 0.000841307 0.032670756
Cell adhesion molecules (CAMs) 440 KEGG 5 128 4% 0.001173254 0.039052614
Allograft rejection 2793 KEGG 3 34 9% 0.001218383 0.035485408
Cd40l signaling pathway 4093 PID BIOCARTA 2 9 22% 0.001357356 0.035140439
Cytokine-cytokine receptor interaction 515 KEGG 7 272 3% 0.001404711 0.032729758
G2/M Transition 1652 REACTOME 4 82 5% 0.001678294 0.035549326
Costimulation by the CD28 family 5974 REACTOME 3 41 7% 0.002104751 0.040867247
Tnfr2 signaling pathway 4004 PID BIOCARTA 2 12 17% 0.002458745 0.044068273
Calcineurin-regulated NFAT-dependent transcription in 7969 PID NCI 3 44 7% 0.002580759 0.04295121
lymphocytes
Jak-STAT signaling pathway 568 KEGG 5 153 3% 0.002581382 0.040097474
Il 2 signaling pathway 3954 PID BIOCARTA 2 14 14% 0.00336305 0.048974412
Supplementary Table 17 Significant pathways identified by the InnateDB pathway analysis of the expanded gene list after exclusion of MHC region genes. Pathways are shown which
remain significant (p < 0.05) after correction for multiple testing.
Pathway Name Pathway Source Name Pathway uploaded Genes in InnateDB Genes Pathway Pathway p-value
Id gene count for this entity Ratio p-value (corrected)
Intestinal immune network for IgA production 8118 KEGG 11 45 24% 1.58E-11 8.41E-09
Allograft rejection 2793 KEGG 10 34 29% 1.81E-11 4.83E-09
Autoimmune thyroid disease 2799 KEGG 11 49 22% 4.32E-11 7.69E-09
Cell adhesion molecules (CAMs) 440 KEGG 15 128 12% 1.92E-10 2.56E-08
Graft-versus-host disease 2807 KEGG 9 35 26% 7.47E-10 7.98E-08
Asthma 2818 KEGG 8 27 30% 1.98E-09 1.76E-07
The co-stimulatory signal during t-cell activation 4018 PID BIOCARTA 7 18 39% 2.39E-09 1.82E-07
Type I diabetes mellitus 525 KEGG 9 40 23% 2.73E-09 1.82E-07
Antigen processing and presentation 4144 PID BIOCARTA 6 12 50% 5.58E-09 3.31E-07
Primary immunodeficiency 2815 KEGG 8 35 23% 1.91E-08 1.02E-06
Viral myocarditis 8123 KEGG 10 67 15% 2.34E-08 1.14E-06
T cell receptor signaling (IKK-NF-kappaB cascade) 10 INOH 9 54 17% 4.52E-08 2.01E-06
CD4 T cell receptor signaling (Vav, Rac and JNK cascade) 276 INOH 8 45 18% 1.56E-07 6.40E-06
Systemic lupus erythematosus 2805 KEGG 10 95 11% 6.90E-07 2.63E-05
T cell receptor signaling pathway 354 INOH 9 83 11% 2.00E-06 7.11E-05
IL12 signaling mediated by STAT4 8013 PID NCI 6 29 21% 2.36E-06 7.89E-05
176

Signaling in Immune system 1959 REACTOME 15 275 5% 5.47E-06 0.0001718


Antigen processing and presentation 493 KEGG 8 74 11% 7.81E-06 0.0002317
T cell receptor signaling (ERK cascade) 362 INOH 6 37 16% 1.06E-05 0.000297
Lck and fyn tyrosine kinases in initiation of tcr activation 4116 PID BIOCARTA 4 11 36% 1.11E-05 0.0002951
IL12-mediated signaling events 8015 PID NCI 7 59 12% 1.60E-05 0.0004068
TCR signaling in naïve CD4+ T cells 8033 PID NCI 7 63 11% 2.48E-05 0.0006017
IL2 signaling events mediated by STAT5 8016 PID NCI 5 28 18% 3.69E-05 0.000822
Role of mef2d in t-cell apoptosis 4085 PID BIOCARTA 5 28 18% 3.69E-05 0.000822
TCR signaling in naïve CD8+ T cells 8004 PID NCI 6 52 12% 7.81E-05 0.0016689
T cell receptor signaling pathway 563 KEGG 8 108 7% 0.0001234 0.0025346
Hematopoietic cell lineage 415 KEGG 7 85 8% 0.0001721 0.0034029
Downstream signaling in naïve CD8+ T cells 8044 PID NCI 6 61 10% 0.0001925 0.0036719
Cd40l signaling pathway 4093 PID BIOCARTA 3 9 33% 0.0002101 0.0038694
Costimulation by the CD28 family 5974 REACTOME 5 41 12% 0.0002439 0.0043419
Jak-STAT signaling pathway 568 KEGG 9 153 6% 0.00027 0.0046507
Calcineurin-regulated NFAT-dependent transcription in lymphocytes 7969 PID NCI 5 44 11% 0.000342 0.0057064
IL27-mediated signaling events 7977 PID NCI 4 26 15% 0.0004261 0.0068943
BCR 3931 NETPATH 8 139 6% 0.0006916 0.0108623
CD40/CD40L signaling 8009 PID NCI 4 30 13% 0.0007482 0.0114157
T cell receptor signaling pathway 4156 PID BIOCARTA 5 53 9% 0.0008181 0.0121354
Il 2 signaling pathway 3954 PID BIOCARTA 3 14 21% 0.0008654 0.0124892
IL2-mediated signaling events 8098 PID NCI 5 55 9% 0.0009701 0.0136328
IL2 signaling events mediated by PI3K 8002 PID NCI 4 34 12% 0.0012131 0.01661
TCR signaling 1548 REACTOME 5 61 8% 0.0015529 0.0207316
TCR 3920 NETPATH 7 124 6% 0.0016841 0.0219339
Activation of csk by camp-dependent protein kinase inhibits signaling 4160 PID BIOCARTA 4 38 11% 0.00185 0.0235216
through the t cell receptor
CTLA4 inhibitory signaling 6236 REACTOME 2 5 40% 0.0018863 0.0234253
IL2 3904 NETPATH 5 65 8% 0.0020633 0.0250407
Ras-independent pathway in nk cell-mediated cytotoxicity 4025 PID BIOCARTA 3 22 14% 0.0033756 0.0400571
Negative feedback regulation of JAK STAT pathway by cytokine receptor 248 INOH 4 46 9% 0.0037554 0.0435956
degradation signaling
Canonical NF-kappaB pathway 8045 PID NCI 3 23 13% 0.0038428 0.0436613
IL-12 signaling pathway(JAK2 TYK2 STAT4) 388 INOH 2 7 29% 0.0038894 0.0423864
IL-2 signaling pathway(JAK1 JAK3 STAT5) 382 INOH 2 7 29% 0.0038894 0.0423864
CXCR4-mediated signaling events 8061 PID NCI 5 77 6% 0.0043262 0.0462039
Il-2 receptor beta chain in t cell activation 4011 PID BIOCARTA 4 49 8% 0.0047228 0.0494502

Supplementary Table 18 Significant pathways identified by the InnateDB pathway analysis of the expanded gene list after exclusion of MHC region genes. Pathways are shown which
remain significant (p < 0.05) after correction for multiple testing.
Pathway Name Pathway Id Source Name Pathway uploaded Genes in InnateDB Genes Pathway Pathway p-value
177

gene count for this entity Ratio p-value (corrected)


Signaling in Immune system 1959 REACTOME 15 275 5% 1.21E-06 0.0005573
The co-stimulatory signal during t-cell activation 4018 PID BIOCARTA 5 18 28% 2.02E-06 0.0004662
Primary immunodeficiency 2815 KEGG 6 35 17% 3.82E-06 0.0005889
IL2 signaling events mediated by STAT5 8016 PID NCI 5 28 18% 2.10E-05 0.0024234
IL12 signaling mediated by STAT4 8013 PID NCI 5 29 17% 2.51E-05 0.0023198
TCR signaling in naïve CD8+ T cells 8004 PID NCI 6 52 12% 4.05E-05 0.003117
T cell receptor signaling pathway 563 KEGG 8 108 7% 5.37E-05 0.003546
IL12-mediated signaling events 8015 PID NCI 6 59 10% 8.36E-05 0.0048253
Downstream signaling in naïve CD8+ T cells 8044 PID NCI 6 61 10% 0.0001009 0.0051814
Jak-STAT signaling pathway 568 KEGG 9 153 6% 0.0001102 0.005091
TCR signaling in naïve CD4+ T cells 8033 PID NCI 6 63 10% 0.0001211 0.0050853
Costimulation by the CD28 family 5974 REACTOME 5 41 12% 0.0001409 0.0054228
Cd40l signaling pathway 4093 PID BIOCARTA 3 9 33% 0.0001486 0.0052815
Cell adhesion molecules (CAMs) 440 KEGG 8 128 6% 0.0001782 0.0058815
Calcineurin-regulated NFAT-dependent transcription in lymphocytes 7969 PID NCI 5 44 11% 0.0001982 0.006105
IL27-mediated signaling events 7977 PID NCI 4 26 15% 0.0002726 0.0078712
BCR 3931 NETPATH 8 139 6% 0.000314 0.008533
CD40/CD40L signaling 8009 PID NCI 4 30 13% 0.0004811 0.0123474
T cell receptor signaling (IKK-NF-kappaB cascade) 10 INOH 5 54 9% 0.0005235 0.0127284
IL2-mediated signaling events 8098 PID NCI 5 55 9% 0.0005702 0.0131719
Il 2 signaling pathway 3954 PID BIOCARTA 3 14 21% 0.0006155 0.0135421
IL2 signaling events mediated by PI3K 8002 PID NCI 4 34 12% 0.0007838 0.0164591
TCR 3920 NETPATH 7 124 6% 0.0008481 0.0170349
TCR signaling 1548 REACTOME 5 61 8% 0.0009197 0.0177041
IL2 3904 NETPATH 5 65 8% 0.0012281 0.0226952
CTLA4 inhibitory signaling 6236 REACTOME 2 5 40% 0.0014969 0.026599
CD4 T cell receptor signaling (Vav, Rac and JNK cascade) 276 INOH 4 45 9% 0.0022691 0.0374396
Intestinal immune network for IgA production 8118 KEGG 4 45 9% 0.0022691 0.0374396
Ras-independent pathway in nk cell-mediated cytotoxicity 4025 PID BIOCARTA 3 22 14% 0.0024233 0.0386049
Negative feedback regulation of JAK STAT pathway by cytokine 248 INOH 4 46 9% 0.0024618 0.0379118
receptor degradation signaling
Canonical NF-kappaB pathway 8045 PID NCI 3 23 13% 0.0027618 0.04116
IL-12 signaling pathway(JAK2 TYK2 STAT4) 388 INOH 2 7 29% 0.0030928 0.0432998
IL-2 signaling pathway(JAK1 JAK3 STAT5) 382 INOH 2 7 29% 0.0030928 0.0432998
Autoimmune thyroid disease 2799 KEGG 4 49 8% 0.0031071 0.0410138
Il-2 receptor beta chain in t cell activation 4011 PID BIOCARTA 4 49 8% 0.0031071 0.0410138
Calcium signaling in the CD4+ TCR pathway 8084 PID NCI 3 25 12% 0.003523 0.0452114
T cell receptor signaling pathway 354 INOH 5 83 6% 0.0036238 0.0452483
Lysosphingolipid and LPA receptors 5468 REACTOME 2 8 25% 0.0040905 0.0497315
178
7.1.1.4 PANTHER
Full PANTHER results are shown in Supplementary Table 19, Supplementary Table 20,
Supplementary Table 21, Supplementary Figure 5, Supplementary Figure 6 and
Supplementary Figure 7. Three protein classes failed to show significance after removal of
MHC region genes from the confirmed gene list, including the most associated MHC
antigen class and the highly associated defence/immunity protein class (Supplementary
Table 20). No additional protein classes were identified by exclusion of the MHC region
genes. Following exclusion of genes in the MHC region genes from the expanded gene list,
one protein class attained significance, however, three failed to attain significance,
including the highly associated MHC antigen and defence/immunity classes and kinase class
respectively was observed.

Analysis of the confirmed gene list excluding MHC region genes, resulted in two previously
significant pathways failing to reach significance and one pathway, which was borderline
using the confirmed list (p = 0.0551), attaining significance with a p-value of 0.0488
(Supplementary Table 21). Following exclusion of the MHC region genes from the expanded
gene list, all pathways identified from the expanded gene list analysis remained significant
and additionally one pathway involved in inflammation mediated by chemokine and
179

cytokine signalling attained significance (p = 0.0489) over the expanded gene list analysis (p
= 0.0613).

179
Supplementary Table 19 PANTHER cellular component analysis full results.
Confirmed Confirmed no HLA Expanded Expanded no HLA
Cellular Component H. sapiens Reference
O/E (direction) P-value O/E (direction) P-value O/E (direction) P-value O/E (direction) P-value
MHC protein complex 58 6/0.27 (+) 3.99E-07 0/0.24 (-) 7.85E-01 6/0.57 (+) 2.70E-05 0/0.54 (-) 5.85E-01
protein complex 197 6/0.93 (+) 3.63E-04 0/0.82 (-) 4.38E-01 6/1.93 (+) 1.37E-02 0/1.82 (-) 1.60E-01
Unclassified 17808 83/84.07 (-) 4.07E-01 77/74.23 (+) 2.14E-01 180/174.4 (+) 1.14E-01 174/164.57 (+) 1.10E-02
actin cytoskeleton 415 1/1.96 (-) 4.14E-01 1/1.73 (-) 4.82E-01 2/4.06 (-) 2.26E-01 2/3.84 (-) 2.60E-01
cell junction 121 1/0.57 (+) 4.36E-01 1/0.5 (+) 3.97E-01 1/1.19 (-) 6.68E-01 1/1.12 (-) 6.92E-01
chromosome 1 0/0 (-) 9.95E-01 0/0 (-) 9.96E-01 0/0.01 (-) 9.90E-01 0/0.01 (-) 9.91E-01
cytoplasm 146 0/0.69 (-) 5.01E-01 0/0.61 (-) 5.43E-01 0/1.43 (-) 2.38E-01 0/1.35 (-) 2.58E-01
cytoskeleton 1003 4/4.74 (-) 4.85E-01 4/4.18 (-) 5.93E-01 7/9.82 (-) 2.30E-01 7/9.27 (-) 2.87E-01
cytosol 6 0/0.03 (-) 9.72E-01 0/0.03 (-) 9.75E-01 0/0.06 (-) 9.43E-01 0/0.06 (-) 9.46E-01
endoplasmic reticulum 2 0/0.01 (-) 9.91E-01 0/0.01 (-) 9.92E-01 0/0.02 (-) 9.81E-01 0/0.02 (-) 9.82E-01
extracellular matrix 501 2/2.37 (-) 5.78E-01 1/2.09 (-) 3.79E-01 3/4.91 (-) 2.75E-01 2/4.63 (-) 1.56E-01
extracellular region 505 2/2.38 (-) 5.73E-01 1/2.11 (-) 3.75E-01 3/4.95 (-) 2.69E-01 2/4.67 (-) 1.52E-01
extracellular space 1 0/0 (-) 9.95E-01 0/0 (-) 9.96E-01 0/0.01 (-) 9.90E-01 0/0.01 (-) 9.91E-01
gap junction 21 0/0.1 (-) 9.06E-01 0/0.09 (-) 9.16E-01 0/0.21 (-) 8.14E-01 0/0.19 (-) 8.24E-01
heterotrimeric G-protein complex 43 0/0.2 (-) 8.16E-01 0/0.18 (-) 8.36E-01 0/0.42 (-) 6.56E-01 0/0.4 (-) 6.72E-01
immunoglobulin complex 32 0/0.15 (-) 8.60E-01 0/0.13 (-) 8.75E-01 0/0.31 (-) 7.31E-01 0/0.3 (-) 7.44E-01
intermediate filament cytoskeleton 130 0/0.61 (-) 5.40E-01 0/0.54 (-) 5.81E-01 0/1.27 (-) 2.79E-01 0/1.2 (-) 3.00E-01
180

intracellular 1192 4/5.63 (-) 3.30E-01 4/4.97 (-) 4.41E-01 7/11.67 (-) 9.77E-02 7/11.02 (-) 1.34E-01
microtubule 348 2/1.64 (+) 4.91E-01 2/1.45 (+) 4.27E-01 3/3.41 (-) 5.56E-01 3/3.22 (-) 5.99E-01
mitochondrion 91 0/0.43 (-) 6.50E-01 0/0.38 (-) 6.84E-01 0/0.89 (-) 4.09E-01 0/0.84 (-) 4.30E-01
nuclear chromosome 1 0/0 (-) 9.95E-01 0/0 (-) 9.96E-01 0/0.01 (-) 9.90E-01 0/0.01 (-) 9.91E-01
nucleolus 1 0/0 (-) 9.95E-01 0/0 (-) 9.96E-01 0/0.01 (-) 9.90E-01 0/0.01 (-) 9.91E-01
nucleus 23 0/0.11 (-) 8.97E-01 0/0.1 (-) 9.09E-01 0/0.23 (-) 7.98E-01 0/0.21 (-) 8.08E-01
organelle 94 0/0.44 (-) 6.41E-01 0/0.39 (-) 6.75E-01 0/0.92 (-) 3.97E-01 0/0.87 (-) 4.19E-01
plasma membrane 131 1/0.62 (+) 4.62E-01 1/0.55 (+) 4.22E-01 1/1.28 (-) 6.33E-01 1/1.21 (-) 6.59E-01
proton-transporting ATP synthase complex 20 0/0.09 (-) 9.10E-01 0/0.08 (-) 9.20E-01 0/0.2 (-) 8.22E-01 0/0.18 (-) 8.31E-01
ribonucleoprotein complex 164 0/0.77 (-) 4.60E-01 0/0.68 (-) 5.03E-01 0/1.61 (-) 1.99E-01 0/1.52 (-) 2.18E-01
SNARE complex 38 0/0.18 (-) 8.36E-01 0/0.16 (-) 8.53E-01 0/0.37 (-) 6.89E-01 0/0.35 (-) 7.04E-01
tight junction 34 0/0.16 (-) 8.52E-01 0/0.14 (-) 8.68E-01 0/0.33 (-) 7.17E-01 0/0.31 (-) 7.30E-01
tubulin complex 25 0/0.12 (-) 8.89E-01 0/0.1 (-) 9.01E-01 0/0.24 (-) 7.83E-01 0/0.23 (-) 7.94E-01
vesicle coat 42 0/0.2 (-) 8.20E-01 0/0.18 (-) 8.39E-01 0/0.41 (-) 6.62E-01 0/0.39 (-) 6.78E-01
Supplementary Table 20 PANTHER protein class analysis full results.
Confirmed Confirmed no HLA Expanded Expanded no HLA
PANTHER Protein Class H. Sapiens Reference
O/E (direction) P-value O/E (direction) P-value O/E (direction) P-value O/E (direction) P-value
ATP-binding cassette (ABC) transporter 74 2/0.35 (+) 4.82E-02 0/0.31 (-) 7.34E-01 2/0.72 (+) 1.64E-01 0/0.68 (-) 5.04E-01
carbohydrate phosphatase 9 1/0.04 (+) 4.16E-02 1/0.04 (+) 3.68E-02 1/0.09 (+) 8.44E-02 1/0.08 (+) 7.98E-02
cytokine 159 5/0.75 (+) 9.89E-04 5/0.66 (+) 5.62E-04 5/1.56 (+) 2.09E-02 5/1.47 (+) 1.67E-02
defense/immunity protein 107 6/0.51 (+) 1.31E-05 0/0.45 (-) 6.39E-01 6/1.05 (+) 7.17E-04 0/0.99 (-) 3.71E-01
endodeoxyribonuclease 20 1/0.09 (+) 9.01E-02 1/0.08 (+) 8.00E-02 3/0.2 (+) 1.07E-03 3/0.18 (+) 9.04E-04
kinase 679 6/3.21 (+) 1.02E-01 6/2.83 (+) 6.44E-02 13/6.65 (+) 1.70E-02 13/6.27 (+) 1.10E-02
kinase activator 45 0/0.21 (-) 8.08E-01 0/0.19 (-) 8.29E-01 3/0.44 (+) 1.02E-02 3/0.42 (+) 8.70E-03
KRAB box transcription factor 552 0/2.61 (-) 7.12E-02 0/2.3 (-) 9.70E-02 1/5.41 (-) 2.73E-02 1/5.1 (-) 3.54E-02
ligase 260 2/1.23 (+) 3.48E-01 1/1.08 (-) 7.05E-01 6/2.55 (+) 4.41E-02 5/2.4 (+) 9.49E-02
major histocompatibility complex antigen 28 6/0.13 (+) 5.66E-09 0/0.12 (-) 8.90E-01 6/0.27 (+) 4.35E-07 0/0.26 (-) 7.72E-01
microtubule binding motor protein 46 2/0.22 (+) 2.03E-02 2/0.19 (+) 1.60E-02 2/0.45 (+) 7.54E-02 2/0.43 (+) 6.82E-02
protein kinase 510 3/2.41 (+) 4.34E-01 3/2.13 (+) 3.58E-01 9/4.99 (+) 6.52E-02 9/4.71 (+) 4.87E-02
receptor 1076 8/5.08 (+) 1.36E-01 8/4.49 (+) 7.97E-02 19/10.54 (+) 9.82E-03 19/9.94 (+) 5.38E-03
signaling molecule 961 10/4.54 (+) 1.54E-02 10/4.01 (+) 6.66E-03 18/9.41 (+) 6.73E-03 18/8.88 (+) 3.70E-03
transferase 1512 8/7.14 (+) 4.23E-01 8/6.3 (+) 2.95E-01 25/14.81 (+) 7.21E-03 25/13.97 (+) 3.42E-03
transketolase 5 1/0.02 (+) 2.33E-02 1/0.02 (+) 2.06E-02 1/0.05 (+) 4.78E-02 1/0.05 (+) 4.52E-02
type I cytokine receptor 31 1/0.15 (+) 1.36E-01 1/0.13 (+) 1.21E-01 2/0.3 (+) 3.76E-02 2/0.29 (+) 3.39E-02
181

Unclassified 6763 27/31.93 (-) 1.68E-01 22/28.19 (-) 9.17E-02 53/66.23 (-) 2.55E-02 48/62.5 (-) 1.33E-02
viral coat protein 7 1/0.03 (+) 3.25E-02 1/0.03 (+) 2.88E-02 1/0.07 (+) 6.63E-02 1/0.06 (+) 6.27E-02
viral protein 7 1/0.03 (+) 3.25E-02 1/0.03 (+) 2.88E-02 1/0.07 (+) 6.63E-02 1/0.06 (+) 6.27E-02
acetyltransferase 105 0/0.5 (-) 6.08E-01 0/0.44 (-) 6.45E-01 1/1.03 (-) 7.25E-01 1/0.97 (+) 6.22E-01
actin and actin related protein 31 0/0.15 (-) 8.64E-01 0/0.13 (-) 8.79E-01 0/0.3 (-) 7.38E-01 0/0.29 (-) 7.51E-01
actin binding motor protein 9 0/0.04 (-) 9.58E-01 0/0.04 (-) 9.63E-01 0/0.09 (-) 9.16E-01 0/0.08 (-) 9.20E-01
actin family cytoskeletal protein 222 1/1.05 (-) 7.18E-01 1/0.93 (+) 6.06E-01 1/2.17 (-) 3.59E-01 1/2.05 (-) 3.91E-01
acyltransferase 88 0/0.42 (-) 6.59E-01 0/0.37 (-) 6.92E-01 1/0.86 (+) 5.78E-01 1/0.81 (+) 5.57E-01
adenylate cyclase 21 0/0.1 (-) 9.06E-01 0/0.09 (-) 9.16E-01 0/0.21 (-) 8.14E-01 0/0.19 (-) 8.24E-01
aldolase 4 0/0.02 (-) 9.81E-01 0/0.02 (-) 9.83E-01 0/0.04 (-) 9.62E-01 0/0.04 (-) 9.64E-01
amino acid kinase 5 0/0.02 (-) 9.77E-01 0/0.02 (-) 9.79E-01 0/0.05 (-) 9.52E-01 0/0.05 (-) 9.55E-01
amino acid transporter 98 0/0.46 (-) 6.29E-01 0/0.41 (-) 6.64E-01 0/0.96 (-) 3.82E-01 0/0.91 (-) 4.03E-01
aminoacyl-tRNA synthetase 20 1/0.09 (+) 9.01E-02 1/0.08 (+) 8.00E-02 1/0.2 (+) 1.78E-01 1/0.18 (+) 1.69E-01
amylase 6 0/0.03 (-) 9.72E-01 0/0.03 (-) 9.75E-01 0/0.06 (-) 9.43E-01 0/0.06 (-) 9.46E-01
anion channel 37 0/0.17 (-) 8.40E-01 0/0.15 (-) 8.57E-01 0/0.36 (-) 6.96E-01 0/0.34 (-) 7.10E-01
annexin 9 0/0.04 (-) 9.58E-01 0/0.04 (-) 9.63E-01 0/0.09 (-) 9.16E-01 0/0.08 (-) 9.20E-01
antibacterial response protein 18 0/0.08 (-) 9.18E-01 0/0.08 (-) 9.28E-01 0/0.18 (-) 8.38E-01 0/0.17 (-) 8.47E-01
apolipoprotein 42 0/0.2 (-) 8.20E-01 0/0.18 (-) 8.39E-01 0/0.41 (-) 6.62E-01 0/0.39 (-) 6.78E-01
aspartic protease 14 0/0.07 (-) 9.36E-01 0/0.06 (-) 9.43E-01 0/0.14 (-) 8.72E-01 0/0.13 (-) 8.79E-01
ATP synthase 51 0/0.24 (-) 7.86E-01 0/0.21 (-) 8.08E-01 0/0.5 (-) 6.06E-01 0/0.47 (-) 6.24E-01
basic helix-loop-helix transcription factor 120 0/0.57 (-) 5.67E-01 0/0.5 (-) 6.05E-01 1/1.18 (-) 6.71E-01 1/1.11 (-) 6.96E-01
basic leucine zipper transcription factor 24 0/0.11 (-) 8.93E-01 0/0.1 (-) 9.05E-01 1/0.24 (+) 2.10E-01 1/0.22 (+) 1.99E-01
calcium channel 31 0/0.15 (-) 8.64E-01 0/0.13 (-) 8.79E-01 0/0.3 (-) 7.38E-01 0/0.29 (-) 7.51E-01
calcium-binding protein 63 0/0.3 (-) 7.42E-01 0/0.26 (-) 7.69E-01 0/0.62 (-) 5.39E-01 0/0.58 (-) 5.58E-01
calmodulin 33 0/0.16 (-) 8.56E-01 0/0.14 (-) 8.71E-01 0/0.32 (-) 7.24E-01 0/0.3 (-) 7.37E-01
calsequestrin 2 0/0.01 (-) 9.91E-01 0/0.01 (-) 9.92E-01 0/0.02 (-) 9.81E-01 0/0.02 (-) 9.82E-01
carbohydrate kinase 35 0/0.17 (-) 8.48E-01 0/0.15 (-) 8.64E-01 0/0.34 (-) 7.10E-01 0/0.32 (-) 7.23E-01
carbohydrate transporter 46 0/0.22 (-) 8.05E-01 0/0.19 (-) 8.25E-01 0/0.45 (-) 6.37E-01 0/0.43 (-) 6.53E-01
cation transporter 178 1/0.84 (+) 5.70E-01 1/0.74 (+) 5.25E-01 3/1.74 (+) 2.54E-01 3/1.64 (+) 2.28E-01
cell adhesion molecule 93 0/0.44 (-) 6.44E-01 0/0.39 (-) 6.78E-01 0/0.91 (-) 4.01E-01 0/0.86 (-) 4.23E-01
cell junction protein 67 0/0.32 (-) 7.28E-01 0/0.28 (-) 7.56E-01 0/0.66 (-) 5.18E-01 0/0.62 (-) 5.38E-01
centromere DNA-binding protein 18 0/0.08 (-) 9.18E-01 0/0.08 (-) 9.28E-01 0/0.18 (-) 8.38E-01 0/0.17 (-) 8.47E-01
chaperone 130 0/0.61 (-) 5.40E-01 0/0.54 (-) 5.81E-01 0/1.27 (-) 2.79E-01 0/1.2 (-) 3.00E-01
chaperonin 20 0/0.09 (-) 9.10E-01 0/0.08 (-) 9.20E-01 0/0.2 (-) 8.22E-01 0/0.18 (-) 8.31E-01
chemokine 48 1/0.23 (+) 2.03E-01 1/0.2 (+) 1.82E-01 1/0.47 (+) 3.75E-01 1/0.44 (+) 3.59E-01
chromatin/chromatin-binding protein 74 0/0.35 (-) 7.05E-01 0/0.31 (-) 7.34E-01 0/0.72 (-) 4.84E-01 0/0.68 (-) 5.04E-01
CREB transcription factor 24 0/0.11 (-) 8.93E-01 0/0.1 (-) 9.05E-01 1/0.24 (+) 2.10E-01 1/0.22 (+) 1.99E-01
cyclase 23 0/0.11 (-) 8.97E-01 0/0.1 (-) 9.09E-01 0/0.23 (-) 7.98E-01 0/0.21 (-) 8.08E-01
cysteine protease 121 1/0.57 (+) 4.36E-01 1/0.5 (+) 3.97E-01 1/1.19 (-) 6.68E-01 1/1.12 (-) 6.92E-01
cysteine protease inhibitor 21 0/0.1 (-) 9.06E-01 0/0.09 (-) 9.16E-01 0/0.21 (-) 8.14E-01 0/0.19 (-) 8.24E-01
cytokine receptor 213 2/1.01 (+) 2.66E-01 2/0.89 (+) 2.23E-01 3/2.09 (+) 3.47E-01 3/1.97 (+) 3.15E-01
182

cytoskeletal protein 441 4/2.08 (+) 1.56E-01 4/1.84 (+) 1.13E-01 5/4.32 (+) 4.34E-01 5/4.08 (+) 3.86E-01
damaged DNA-binding protein 45 0/0.21 (-) 8.08E-01 0/0.19 (-) 8.29E-01 0/0.44 (-) 6.43E-01 0/0.42 (-) 6.59E-01
deacetylase 2 0/0.01 (-) 9.91E-01 0/0.01 (-) 9.92E-01 0/0.02 (-) 9.81E-01 0/0.02 (-) 9.82E-01
deaminase 12 0/0.06 (-) 9.45E-01 0/0.05 (-) 9.51E-01 0/0.12 (-) 8.89E-01 0/0.11 (-) 8.95E-01
decarboxylase 17 0/0.08 (-) 9.23E-01 0/0.07 (-) 9.32E-01 0/0.17 (-) 8.47E-01 0/0.16 (-) 8.55E-01
dehydratase 31 0/0.15 (-) 8.64E-01 0/0.13 (-) 8.79E-01 0/0.3 (-) 7.38E-01 0/0.29 (-) 7.51E-01
dehydrogenase 210 0/0.99 (-) 3.69E-01 0/0.88 (-) 4.15E-01 2/2.06 (-) 6.61E-01 2/1.94 (+) 5.79E-01
DNA binding protein 476 3/2.25 (+) 3.91E-01 3/1.98 (+) 3.19E-01 6/4.66 (+) 3.24E-01 6/4.4 (+) 2.79E-01
DNA glycosylase 12 0/0.06 (-) 9.45E-01 0/0.05 (-) 9.51E-01 0/0.12 (-) 8.89E-01 0/0.11 (-) 8.95E-01
DNA helicase 80 0/0.38 (-) 6.85E-01 0/0.33 (-) 7.16E-01 0/0.78 (-) 4.56E-01 0/0.74 (-) 4.77E-01
DNA ligase 6 0/0.03 (-) 9.72E-01 0/0.03 (-) 9.75E-01 0/0.06 (-) 9.43E-01 0/0.06 (-) 9.46E-01
DNA methyltransferase 17 0/0.08 (-) 9.23E-01 0/0.07 (-) 9.32E-01 0/0.17 (-) 8.47E-01 0/0.16 (-) 8.55E-01
DNA photolyase 2 0/0.01 (-) 9.91E-01 0/0.01 (-) 9.92E-01 0/0.02 (-) 9.81E-01 0/0.02 (-) 9.82E-01
DNA polymerase processivity factor 2 0/0.01 (-) 9.91E-01 0/0.01 (-) 9.92E-01 0/0.02 (-) 9.81E-01 0/0.02 (-) 9.82E-01
DNA strand-pairing protein 7 0/0.03 (-) 9.67E-01 0/0.03 (-) 9.71E-01 0/0.07 (-) 9.34E-01 0/0.06 (-) 9.37E-01
DNA topoisomerase 4 0/0.02 (-) 9.81E-01 0/0.02 (-) 9.83E-01 0/0.04 (-) 9.62E-01 0/0.04 (-) 9.64E-01
DNA-directed DNA polymerase 10 0/0.05 (-) 9.54E-01 0/0.04 (-) 9.59E-01 0/0.1 (-) 9.07E-01 0/0.09 (-) 9.12E-01
DNA-directed RNA polymerase 15 0/0.07 (-) 9.32E-01 0/0.06 (-) 9.39E-01 0/0.15 (-) 8.63E-01 0/0.14 (-) 8.71E-01
endoribonuclease 33 1/0.16 (+) 1.44E-01 1/0.14 (+) 1.29E-01 1/0.32 (+) 2.76E-01 1/0.3 (+) 2.63E-01
enzyme modulator 857 5/4.05 (+) 3.80E-01 5/3.57 (+) 2.86E-01 10/8.39 (+) 3.31E-01 10/7.92 (+) 2.70E-01
epimerase/racemase 7 0/0.03 (-) 9.67E-01 0/0.03 (-) 9.71E-01 0/0.07 (-) 9.34E-01 0/0.06 (-) 9.37E-01
esterase 25 0/0.12 (-) 8.89E-01 0/0.1 (-) 9.01E-01 0/0.24 (-) 7.83E-01 0/0.23 (-) 7.94E-01
exodeoxyribonuclease 1 0/0 (-) 9.95E-01 0/0 (-) 9.96E-01 0/0.01 (-) 9.90E-01 0/0.01 (-) 9.91E-01
exoribonuclease 24 0/0.11 (-) 8.93E-01 0/0.1 (-) 9.05E-01 0/0.24 (-) 7.90E-01 0/0.22 (-) 8.01E-01
extracellular matrix glycoprotein 55 0/0.26 (-) 7.71E-01 0/0.23 (-) 7.95E-01 0/0.54 (-) 5.83E-01 0/0.51 (-) 6.01E-01
extracellular matrix protein 72 0/0.34 (-) 7.11E-01 0/0.3 (-) 7.40E-01 0/0.71 (-) 4.93E-01 0/0.67 (-) 5.13E-01
extracellular matrix structural protein 9 0/0.04 (-) 9.58E-01 0/0.04 (-) 9.63E-01 0/0.09 (-) 9.16E-01 0/0.08 (-) 9.20E-01
galactosidase 16 0/0.08 (-) 9.27E-01 0/0.07 (-) 9.35E-01 0/0.16 (-) 8.55E-01 0/0.15 (-) 8.63E-01
gap junction 21 0/0.1 (-) 9.06E-01 0/0.09 (-) 9.16E-01 0/0.21 (-) 8.14E-01 0/0.19 (-) 8.24E-01
glucosidase 15 0/0.07 (-) 9.32E-01 0/0.06 (-) 9.39E-01 0/0.15 (-) 8.63E-01 0/0.14 (-) 8.71E-01
glycosidase 26 0/0.12 (-) 8.84E-01 0/0.11 (-) 8.97E-01 0/0.25 (-) 7.75E-01 0/0.24 (-) 7.86E-01
glycosyltransferase 229 0/1.08 (-) 3.37E-01 0/0.95 (-) 3.83E-01 1/2.24 (-) 3.43E-01 1/2.12 (-) 3.74E-01
G-protein 206 1/0.97 (+) 6.24E-01 1/0.86 (+) 5.78E-01 1/2.02 (-) 4.00E-01 1/1.9 (-) 4.31E-01
G-protein coupled receptor 447 2/2.11 (-) 6.47E-01 2/1.86 (+) 5.59E-01 4/4.38 (-) 5.55E-01 4/4.13 (-) 6.03E-01
G-protein modulator 278 2/1.31 (+) 3.78E-01 2/1.16 (+) 3.23E-01 4/2.72 (+) 2.90E-01 4/2.57 (+) 2.57E-01
growth factor 165 1/0.78 (+) 5.43E-01 1/0.69 (+) 4.99E-01 1/1.62 (-) 5.19E-01 1/1.52 (-) 5.49E-01
guanyl-nucleotide exchange factor 79 0/0.37 (-) 6.88E-01 0/0.33 (-) 7.19E-01 0/0.77 (-) 4.61E-01 0/0.73 (-) 4.81E-01
helicase 5 0/0.02 (-) 9.77E-01 0/0.02 (-) 9.79E-01 0/0.05 (-) 9.52E-01 0/0.05 (-) 9.55E-01
helix-turn-helix transcription factor 233 1/1.1 (-) 6.99E-01 1/0.97 (+) 6.24E-01 3/2.28 (+) 4.00E-01 3/2.15 (+) 3.65E-01
heterotrimeric G-protein 36 0/0.17 (-) 8.44E-01 0/0.15 (-) 8.61E-01 0/0.35 (-) 7.03E-01 0/0.33 (-) 7.17E-01
histone 54 0/0.25 (-) 7.75E-01 0/0.23 (-) 7.98E-01 0/0.53 (-) 5.89E-01 0/0.5 (-) 6.07E-01
183

HMG box transcription factor 44 0/0.21 (-) 8.12E-01 0/0.18 (-) 8.32E-01 0/0.43 (-) 6.50E-01 0/0.41 (-) 6.66E-01
homeobox transcription factor 233 1/1.1 (-) 6.99E-01 1/0.97 (+) 6.24E-01 3/2.28 (+) 4.00E-01 3/2.15 (+) 3.65E-01
Hsp70 family chaperone 13 0/0.06 (-) 9.40E-01 0/0.05 (-) 9.47E-01 0/0.13 (-) 8.80E-01 0/0.12 (-) 8.87E-01
Hsp90 family chaperone 10 0/0.05 (-) 9.54E-01 0/0.04 (-) 9.59E-01 0/0.1 (-) 9.07E-01 0/0.09 (-) 9.12E-01
hydrolase 454 0/2.14 (-) 1.14E-01 0/1.89 (-) 1.47E-01 1/4.45 (-) 6.18E-02 1/4.2 (-) 7.60E-02
hydroxylase 44 0/0.21 (-) 8.12E-01 0/0.18 (-) 8.32E-01 0/0.43 (-) 6.50E-01 0/0.41 (-) 6.66E-01
immunoglobulin 33 0/0.16 (-) 8.56E-01 0/0.14 (-) 8.71E-01 0/0.32 (-) 7.24E-01 0/0.3 (-) 7.37E-01
immunoglobulin receptor superfamily 155 0/0.73 (-) 4.80E-01 0/0.65 (-) 5.23E-01 0/1.52 (-) 2.18E-01 0/1.43 (-) 2.37E-01
immunoglobulin superfamily cell adhesion molecule 25 0/0.12 (-) 8.89E-01 0/0.1 (-) 9.01E-01 0/0.24 (-) 7.83E-01 0/0.23 (-) 7.94E-01
interferon superfamily 17 0/0.08 (-) 9.23E-01 0/0.07 (-) 9.32E-01 0/0.17 (-) 8.47E-01 0/0.16 (-) 8.55E-01
interleukin superfamily 36 1/0.17 (+) 1.56E-01 1/0.15 (+) 1.39E-01 1/0.35 (+) 2.97E-01 1/0.33 (+) 2.83E-01
intermediate filament 11 0/0.05 (-) 9.49E-01 0/0.05 (-) 9.55E-01 0/0.11 (-) 8.98E-01 0/0.1 (-) 9.03E-01
intermediate filament binding protein 10 0/0.05 (-) 9.54E-01 0/0.04 (-) 9.59E-01 0/0.1 (-) 9.07E-01 0/0.09 (-) 9.12E-01
intracellular calcium-sensing protein 33 0/0.16 (-) 8.56E-01 0/0.14 (-) 8.71E-01 0/0.32 (-) 7.24E-01 0/0.3 (-) 7.37E-01
ion channel 341 0/1.61 (-) 1.97E-01 0/1.42 (-) 2.38E-01 1/3.34 (-) 1.51E-01 1/3.15 (-) 1.75E-01
isomerase 94 0/0.44 (-) 6.41E-01 0/0.39 (-) 6.75E-01 1/0.92 (+) 6.03E-01 1/0.87 (+) 5.81E-01
kinase inhibitor 15 0/0.07 (-) 9.32E-01 0/0.06 (-) 9.39E-01 0/0.15 (-) 8.63E-01 0/0.14 (-) 8.71E-01
kinase modulator 103 0/0.49 (-) 6.14E-01 0/0.43 (-) 6.50E-01 3/1.01 (+) 8.14E-02 3/0.95 (+) 7.12E-02
ligand-gated ion channel 101 0/0.48 (-) 6.20E-01 0/0.42 (-) 6.56E-01 0/0.99 (-) 3.71E-01 0/0.93 (-) 3.92E-01
lipase 57 0/0.27 (-) 7.64E-01 0/0.24 (-) 7.88E-01 0/0.56 (-) 5.72E-01 0/0.53 (-) 5.90E-01
lyase 104 0/0.49 (-) 6.11E-01 0/0.43 (-) 6.47E-01 0/1.02 (-) 3.60E-01 0/0.96 (-) 3.82E-01
membrane traffic protein 321 2/1.52 (+) 4.49E-01 2/1.34 (+) 3.88E-01 4/3.14 (+) 3.85E-01 4/2.97 (+) 3.45E-01
membrane trafficking regulatory protein 107 0/0.51 (-) 6.03E-01 0/0.45 (-) 6.39E-01 1/1.05 (-) 7.18E-01 1/0.99 (+) 6.29E-01
membrane-bound signaling molecule 133 1/0.63 (+) 4.67E-01 1/0.55 (+) 4.27E-01 1/1.3 (-) 6.26E-01 1/1.23 (-) 6.52E-01
metalloprotease 145 0/0.68 (-) 5.03E-01 0/0.6 (-) 5.45E-01 0/1.42 (-) 2.40E-01 0/1.34 (-) 2.61E-01
metalloprotease inhibitor 4 0/0.02 (-) 9.81E-01 0/0.02 (-) 9.83E-01 0/0.04 (-) 9.62E-01 0/0.04 (-) 9.64E-01
methyltransferase 130 0/0.61 (-) 5.40E-01 0/0.54 (-) 5.81E-01 1/1.27 (-) 6.36E-01 1/1.2 (-) 6.62E-01
microtubule family cytoskeletal protein 156 2/0.74 (+) 1.68E-01 2/0.65 (+) 1.38E-01 2/1.53 (+) 4.52E-01 2/1.44 (+) 4.23E-01
mitochondrial carrier protein 17 0/0.08 (-) 9.23E-01 0/0.07 (-) 9.32E-01 0/0.17 (-) 8.47E-01 0/0.16 (-) 8.55E-01
mRNA processing factor 179 1/0.85 (+) 5.72E-01 1/0.75 (+) 5.27E-01 2/1.75 (+) 5.24E-01 2/1.65 (+) 4.93E-01
mutase 14 0/0.07 (-) 9.36E-01 0/0.06 (-) 9.43E-01 0/0.14 (-) 8.72E-01 0/0.13 (-) 8.79E-01
myelin protein 33 0/0.16 (-) 8.56E-01 0/0.14 (-) 8.71E-01 0/0.32 (-) 7.24E-01 0/0.3 (-) 7.37E-01
neuropeptide 28 0/0.13 (-) 8.76E-01 0/0.12 (-) 8.90E-01 0/0.27 (-) 7.60E-01 0/0.26 (-) 7.72E-01
neurotrophic factor 8 0/0.04 (-) 9.63E-01 0/0.03 (-) 9.67E-01 0/0.08 (-) 9.25E-01 0/0.07 (-) 9.29E-01
non-motor actin binding protein 114 1/0.54 (+) 4.17E-01 1/0.48 (+) 3.79E-01 1/1.12 (-) 6.93E-01 1/1.05 (-) 7.16E-01
non-motor microtubule binding protein 53 0/0.25 (-) 7.78E-01 0/0.22 (-) 8.02E-01 0/0.52 (-) 5.95E-01 0/0.49 (-) 6.12E-01
nuclear hormone receptor 46 0/0.22 (-) 8.05E-01 0/0.19 (-) 8.25E-01 0/0.45 (-) 6.37E-01 0/0.43 (-) 6.53E-01
nuclease 35 0/0.17 (-) 8.48E-01 0/0.15 (-) 8.64E-01 0/0.34 (-) 7.10E-01 0/0.32 (-) 7.23E-01
nucleic acid binding 1466 6/6.92 (-) 4.56E-01 6/6.11 (-) 5.88E-01 15/14.36 (+) 4.69E-01 15/13.55 (+) 3.80E-01
nucleotide kinase 51 1/0.24 (+) 2.14E-01 1/0.21 (+) 1.92E-01 1/0.5 (+) 3.94E-01 1/0.47 (+) 3.76E-01
nucleotide phosphatase 35 0/0.17 (-) 8.48E-01 0/0.15 (-) 8.64E-01 0/0.34 (-) 7.10E-01 0/0.32 (-) 7.23E-01
184

nucleotidyltransferase 84 0/0.4 (-) 6.72E-01 0/0.35 (-) 7.04E-01 1/0.82 (+) 5.62E-01 1/0.78 (+) 5.41E-01
oxidase 57 0/0.27 (-) 7.64E-01 0/0.24 (-) 7.88E-01 1/0.56 (+) 4.28E-01 1/0.53 (+) 4.10E-01
oxidoreductase 550 1/2.6 (-) 2.64E-01 1/2.29 (-) 3.28E-01 5/5.39 (-) 5.48E-01 5/5.08 (-) 6.01E-01
oxygenase 74 1/0.35 (+) 2.95E-01 1/0.31 (+) 2.66E-01 1/0.72 (+) 5.16E-01 1/0.68 (+) 4.96E-01
peptide hormone 169 0/0.8 (-) 4.49E-01 0/0.7 (-) 4.93E-01 0/1.66 (-) 1.90E-01 0/1.56 (-) 2.08E-01
peroxidase 27 0/0.13 (-) 8.80E-01 0/0.11 (-) 8.93E-01 0/0.26 (-) 7.68E-01 0/0.25 (-) 7.79E-01
phosphatase 230 1/1.09 (-) 7.04E-01 1/0.96 (+) 6.19E-01 1/2.25 (-) 3.40E-01 1/2.13 (-) 3.71E-01
phosphatase activator 1 0/0 (-) 9.95E-01 0/0 (-) 9.96E-01 0/0.01 (-) 9.90E-01 0/0.01 (-) 9.91E-01
phosphatase inhibitor 28 0/0.13 (-) 8.76E-01 0/0.12 (-) 8.90E-01 0/0.27 (-) 7.60E-01 0/0.26 (-) 7.72E-01
phosphatase modulator 63 0/0.3 (-) 7.42E-01 0/0.26 (-) 7.69E-01 0/0.62 (-) 5.39E-01 0/0.58 (-) 5.58E-01
phosphodiesterase 27 0/0.13 (-) 8.80E-01 0/0.11 (-) 8.93E-01 1/0.26 (+) 2.32E-01 1/0.25 (+) 2.21E-01
phospholipase 36 0/0.17 (-) 8.44E-01 0/0.15 (-) 8.61E-01 0/0.35 (-) 7.03E-01 0/0.33 (-) 7.17E-01
phosphorylase 13 0/0.06 (-) 9.40E-01 0/0.05 (-) 9.47E-01 0/0.13 (-) 8.80E-01 0/0.12 (-) 8.87E-01
potassium channel 89 0/0.42 (-) 6.56E-01 0/0.37 (-) 6.89E-01 1/0.87 (+) 5.83E-01 1/0.82 (+) 5.61E-01
primase 1 0/0 (-) 9.95E-01 0/0 (-) 9.96E-01 0/0.01 (-) 9.90E-01 0/0.01 (-) 9.91E-01
protease 476 3/2.25 (+) 3.91E-01 2/1.98 (+) 5.93E-01 3/4.66 (-) 3.13E-01 2/4.4 (-) 1.82E-01
protease inhibitor 109 0/0.51 (-) 5.97E-01 0/0.45 (-) 6.34E-01 0/1.07 (-) 3.43E-01 0/1.01 (-) 3.64E-01
protein kinase receptor 8 0/0.04 (-) 9.63E-01 0/0.03 (-) 9.67E-01 0/0.08 (-) 9.25E-01 0/0.07 (-) 9.29E-01
protein phosphatase 111 0/0.52 (-) 5.91E-01 0/0.46 (-) 6.29E-01 0/1.09 (-) 3.36E-01 0/1.03 (-) 3.57E-01
pyrophosphatase 5 0/0.02 (-) 9.77E-01 0/0.02 (-) 9.79E-01 0/0.05 (-) 9.52E-01 0/0.05 (-) 9.55E-01
reductase 65 0/0.31 (-) 7.35E-01 0/0.27 (-) 7.62E-01 0/0.64 (-) 5.29E-01 0/0.6 (-) 5.48E-01
replication origin binding protein 47 0/0.22 (-) 8.01E-01 0/0.2 (-) 8.22E-01 0/0.46 (-) 6.31E-01 0/0.43 (-) 6.47E-01
reverse transcriptase 1 0/0 (-) 9.95E-01 0/0 (-) 9.96E-01 0/0.01 (-) 9.90E-01 0/0.01 (-) 9.91E-01
ribonucleoprotein 61 0/0.29 (-) 7.49E-01 0/0.25 (-) 7.75E-01 0/0.6 (-) 5.50E-01 0/0.56 (-) 5.69E-01
ribosomal protein 184 0/0.87 (-) 4.18E-01 0/0.77 (-) 4.63E-01 2/1.8 (+) 5.39E-01 2/1.7 (+) 5.08E-01
RNA binding protein 727 2/3.43 (-) 3.29E-01 2/3.03 (-) 4.12E-01 8/7.12 (+) 4.19E-01 8/6.72 (+) 3.59E-01
RNA helicase 71 0/0.34 (-) 7.15E-01 0/0.3 (-) 7.43E-01 0/0.7 (-) 4.98E-01 0/0.66 (-) 5.18E-01
RNA methyltransferase 22 0/0.1 (-) 9.01E-01 0/0.09 (-) 9.12E-01 0/0.22 (-) 8.06E-01 0/0.2 (-) 8.16E-01
serine protease 153 1/0.72 (+) 5.16E-01 1/0.64 (+) 4.73E-01 1/1.5 (-) 5.58E-01 1/1.41 (-) 5.86E-01
serine protease inhibitor 79 0/0.37 (-) 6.88E-01 0/0.33 (-) 7.19E-01 0/0.77 (-) 4.61E-01 0/0.73 (-) 4.81E-01
small GTPase 158 1/0.75 (+) 5.27E-01 1/0.66 (+) 4.84E-01 1/1.55 (-) 5.41E-01 1/1.46 (-) 5.71E-01
SNARE protein 37 0/0.17 (-) 8.40E-01 0/0.15 (-) 8.57E-01 0/0.36 (-) 6.96E-01 0/0.34 (-) 7.10E-01
sodium channel 20 0/0.09 (-) 9.10E-01 0/0.08 (-) 9.20E-01 0/0.2 (-) 8.22E-01 0/0.18 (-) 8.31E-01
storage protein 15 0/0.07 (-) 9.32E-01 0/0.06 (-) 9.39E-01 0/0.15 (-) 8.63E-01 0/0.14 (-) 8.71E-01
structural protein 280 1/1.32 (-) 6.18E-01 1/1.17 (-) 6.74E-01 1/2.74 (-) 2.39E-01 1/2.59 (-) 2.68E-01
surfactant 15 0/0.07 (-) 9.32E-01 0/0.06 (-) 9.39E-01 0/0.15 (-) 8.63E-01 0/0.14 (-) 8.71E-01
TGF-beta receptor 2 0/0.01 (-) 9.91E-01 0/0.01 (-) 9.92E-01 0/0.02 (-) 9.81E-01 0/0.02 (-) 9.82E-01
tight junction 30 0/0.14 (-) 8.68E-01 0/0.13 (-) 8.82E-01 0/0.29 (-) 7.45E-01 0/0.28 (-) 7.58E-01
transaldolase 1 0/0 (-) 9.95E-01 0/0 (-) 9.96E-01 0/0.01 (-) 9.90E-01 0/0.01 (-) 9.91E-01
transaminase 25 0/0.12 (-) 8.89E-01 0/0.1 (-) 9.01E-01 0/0.24 (-) 7.83E-01 0/0.23 (-) 7.94E-01
transcription cofactor 255 1/1.2 (-) 6.61E-01 1/1.06 (-) 7.12E-01 2/2.5 (-) 5.44E-01 2/2.36 (-) 5.81E-01
185

transcription factor 2067 10/9.76 (+) 5.17E-01 9/8.62 (+) 4.97E-01 19/20.24 (-) 4.43E-01 18/19.1 (-) 4.55E-01
transfer/carrier protein 248 0/1.17 (-) 3.08E-01 0/1.03 (-) 3.53E-01 1/2.43 (-) 3.00E-01 1/2.29 (-) 3.31E-01
translation factor 56 0/0.26 (-) 7.67E-01 0/0.23 (-) 7.92E-01 1/0.55 (+) 4.23E-01 1/0.52 (+) 4.04E-01
transmembrane receptor regulatory/adaptor protein 84 0/0.4 (-) 6.72E-01 0/0.35 (-) 7.04E-01 1/0.82 (+) 5.62E-01 1/0.78 (+) 5.41E-01
transporter 1098 4/5.18 (-) 4.03E-01 2/4.58 (-) 1.57E-01 8/10.75 (-) 2.47E-01 6/10.15 (-) 1.14E-01
tubulin 25 0/0.12 (-) 8.89E-01 0/0.1 (-) 9.01E-01 0/0.24 (-) 7.83E-01 0/0.23 (-) 7.94E-01
tumor necrosis factor family member 11 0/0.05 (-) 9.49E-01 0/0.05 (-) 9.55E-01 0/0.11 (-) 8.98E-01 0/0.1 (-) 9.03E-01
tumor necrosis factor receptor 22 1/0.1 (+) 9.87E-02 1/0.09 (+) 8.77E-02 1/0.22 (+) 1.94E-01 1/0.2 (+) 1.84E-01
ubiquitin-protein ligase 132 1/0.62 (+) 4.65E-01 0/0.55 (-) 5.76E-01 3/1.29 (+) 1.41E-01 2/1.22 (+) 3.45E-01
vesicle coat protein 30 0/0.14 (-) 8.68E-01 0/0.13 (-) 8.82E-01 0/0.29 (-) 7.45E-01 0/0.28 (-) 7.58E-01
voltage-gated ion channel 1 0/0 (-) 9.95E-01 0/0 (-) 9.96E-01 0/0.01 (-) 9.90E-01 0/0.01 (-) 9.91E-01
zinc finger transcription factor 803 2/3.79 (-) 2.64E-01 2/3.35 (-) 3.45E-01 6/7.86 (-) 3.25E-01 6/7.42 (-) 3.85E-01
Supplementary Table 21 PANTHER pathway analysis full results.
Confirmed Confirmed no HLA Expanded Expanded no HLA
Pathway H. sapiens Reference O/E P-value O/E P-value O/E P-value O/E P-value
(direction) (direction) (direction) (direction)
Angiogenesis 191 2/0.9 (+) 2.28E-01 1/0.8 (+) 5.51E-01 6/1.87 (+) 1.19E-02 5/1.77 (+) 3.32E-02
B cell activation 82 1/0.39 (+) 3.22E-01 1/0.34 (+) 2.90E-01 4/0.8 (+) 9.02E-03 4/0.76 (+) 7.40E-03
Inflammation mediated by chemokine and cytokine signaling 283 3/1.34 (+) 1.50E-01 3/1.18 (+) 1.15E-01 6/2.77 (+) 6.13E-02 6/2.62 (+) 4.89E-02
pathway
Interleukin signaling pathway 161 4/0.76 (+) 7.32E-03 4/0.67 (+) 4.73E-03 7/1.58 (+) 1.15E-03 7/1.49 (+) 8.23E-04
JAK/STAT signaling pathway 20 1/0.09 (+) 9.01E-02 1/0.08 (+) 8.00E-02 2/0.2 (+) 1.68E-02 2/0.18 (+) 1.51E-02
Notch signaling pathway 47 2/0.22 (+) 2.11E-02 1/0.2 (+) 1.78E-01 2/0.46 (+) 7.82E-02 1/0.43 (+) 3.53E-01
T cell activation 102 3/0.48 (+) 1.27E-02 1/0.43 (+) 3.47E-01 7/1 (+) 7.64E-05 5/0.94 (+) 2.75E-03
Toll receptor signaling pathway 62 2/0.29 (+) 3.51E-02 2/0.26 (+) 2.79E-02 3/0.61 (+) 2.36E-02 3/0.57 (+) 2.03E-02
Unclassified 17337 75/81.85 (-) 3.09E-02 66/72.27 (-) 3.54E-02 156/169.79 (-) 3.60E-03 147/160.21 (-) 4.08E-03
Vasopressin synthesis 12 1/0.06 (+) 5.51E-02 1/0.05 (+) 4.88E-02 1/0.12 (+) 1.11E-01 1/0.11 (+) 1.05E-01
2-arachidonoylglycerol biosynthesis 7 0/0.03 (-) 9.67E-01 0/0.03 (-) 9.71E-01 0/0.07 (-) 9.34E-01 0/0.06 (-) 9.37E-01
5HT1 type receptor mediated signaling pathway 44 0/0.21 (-) 8.12E-01 0/0.18 (-) 8.32E-01 0/0.43 (-) 6.50E-01 0/0.41 (-) 6.66E-01
5HT2 type receptor mediated signaling pathway 69 0/0.33 (-) 7.22E-01 0/0.29 (-) 7.50E-01 2/0.68 (+) 1.47E-01 2/0.64 (+) 1.34E-01
5HT3 type receptor mediated signaling pathway 18 0/0.08 (-) 9.18E-01 0/0.08 (-) 9.28E-01 0/0.18 (-) 8.38E-01 0/0.17 (-) 8.47E-01
5HT4 type receptor mediated signaling pathway 31 0/0.15 (-) 8.64E-01 0/0.13 (-) 8.79E-01 0/0.3 (-) 7.38E-01 0/0.29 (-) 7.51E-01
186

5-Hydroxytryptamine biosynthesis 6 0/0.03 (-) 9.72E-01 0/0.03 (-) 9.75E-01 0/0.06 (-) 9.43E-01 0/0.06 (-) 9.46E-01
5-Hydroxytryptamine degredation 20 0/0.09 (-) 9.10E-01 0/0.08 (-) 9.20E-01 1/0.2 (+) 1.78E-01 1/0.18 (+) 1.69E-01
Acetate utilization 3 0/0.01 (-) 9.86E-01 0/0.01 (-) 9.88E-01 0/0.03 (-) 9.71E-01 0/0.03 (-) 9.73E-01
Adenine and hypoxanthine salvage pathway 10 0/0.05 (-) 9.54E-01 0/0.04 (-) 9.59E-01 0/0.1 (-) 9.07E-01 0/0.09 (-) 9.12E-01
Adrenaline and noradrenaline biosynthesis 32 0/0.15 (-) 8.60E-01 0/0.13 (-) 8.75E-01 1/0.31 (+) 2.69E-01 1/0.3 (+) 2.56E-01
Alanine biosynthesis 2 0/0.01 (-) 9.91E-01 0/0.01 (-) 9.92E-01 0/0.02 (-) 9.81E-01 0/0.02 (-) 9.82E-01
Allantoin degradation 1 0/0 (-) 9.95E-01 0/0 (-) 9.96E-01 0/0.01 (-) 9.90E-01 0/0.01 (-) 9.91E-01
Alpha adrenergic receptor signaling pathway 32 0/0.15 (-) 8.60E-01 0/0.13 (-) 8.75E-01 1/0.31 (+) 2.69E-01 1/0.3 (+) 2.56E-01
Alzheimer disease-amyloid secretase pathway 71 0/0.34 (-) 7.15E-01 0/0.3 (-) 7.43E-01 2/0.7 (+) 1.54E-01 2/0.66 (+) 1.40E-01
Alzheimer disease-presenilin pathway 122 2/0.58 (+) 1.14E-01 1/0.51 (+) 4.00E-01 2/1.19 (+) 3.36E-01 1/1.13 (-) 6.89E-01
Aminobutyrate degradation 2 0/0.01 (-) 9.91E-01 0/0.01 (-) 9.92E-01 0/0.02 (-) 9.81E-01 0/0.02 (-) 9.82E-01
Anandamide_degradation 1 0/0 (-) 9.95E-01 0/0 (-) 9.96E-01 0/0.01 (-) 9.90E-01 0/0.01 (-) 9.91E-01
Androgen/estrogene/progesterone biosynthesis 20 0/0.09 (-) 9.10E-01 0/0.08 (-) 9.20E-01 0/0.2 (-) 8.22E-01 0/0.18 (-) 8.31E-01
Apoptosis signaling pathway 123 1/0.58 (+) 4.41E-01 1/0.51 (+) 4.02E-01 3/1.2 (+) 1.21E-01 3/1.14 (+) 1.07E-01
Arginine biosynthesis 8 0/0.04 (-) 9.63E-01 0/0.03 (-) 9.67E-01 0/0.08 (-) 9.25E-01 0/0.07 (-) 9.29E-01
Ascorbate degradation 4 0/0.02 (-) 9.81E-01 0/0.02 (-) 9.83E-01 0/0.04 (-) 9.62E-01 0/0.04 (-) 9.64E-01
Asparagine and aspartate biosynthesis 5 0/0.02 (-) 9.77E-01 0/0.02 (-) 9.79E-01 0/0.05 (-) 9.52E-01 0/0.05 (-) 9.55E-01
ATP synthesis 8 0/0.04 (-) 9.63E-01 0/0.03 (-) 9.67E-01 0/0.08 (-) 9.25E-01 0/0.07 (-) 9.29E-01
Axon guidance mediated by netrin 30 0/0.14 (-) 8.68E-01 0/0.13 (-) 8.82E-01 0/0.29 (-) 7.45E-01 0/0.28 (-) 7.58E-01
Axon guidance mediated by semaphorins 43 0/0.2 (-) 8.16E-01 0/0.18 (-) 8.36E-01 0/0.42 (-) 6.56E-01 0/0.4 (-) 6.72E-01
Axon guidance mediated by Slit/Robo 20 0/0.09 (-) 9.10E-01 0/0.08 (-) 9.20E-01 0/0.2 (-) 8.22E-01 0/0.18 (-) 8.31E-01
Beta1 adrenergic receptor signaling pathway 44 0/0.21 (-) 8.12E-01 0/0.18 (-) 8.32E-01 0/0.43 (-) 6.50E-01 0/0.41 (-) 6.66E-01
Beta2 adrenergic receptor signaling pathway 44 0/0.21 (-) 8.12E-01 0/0.18 (-) 8.32E-01 0/0.43 (-) 6.50E-01 0/0.41 (-) 6.66E-01
Beta3 adrenergic receptor signaling pathway 26 0/0.12 (-) 8.84E-01 0/0.11 (-) 8.97E-01 0/0.25 (-) 7.75E-01 0/0.24 (-) 7.86E-01
Blood coagulation 48 0/0.23 (-) 7.97E-01 0/0.2 (-) 8.18E-01 0/0.47 (-) 6.25E-01 0/0.44 (-) 6.41E-01
Bupropion_degradation 1 0/0 (-) 9.95E-01 0/0 (-) 9.96E-01 0/0.01 (-) 9.90E-01 0/0.01 (-) 9.91E-01
Cadherin signaling pathway 147 0/0.69 (-) 4.98E-01 0/0.61 (-) 5.41E-01 2/1.44 (+) 4.22E-01 2/1.36 (+) 3.94E-01
Carnitine and CoA metabolism 2 0/0.01 (-) 9.91E-01 0/0.01 (-) 9.92E-01 0/0.02 (-) 9.81E-01 0/0.02 (-) 9.82E-01
Carnitine metabolism 2 0/0.01 (-) 9.91E-01 0/0.01 (-) 9.92E-01 0/0.02 (-) 9.81E-01 0/0.02 (-) 9.82E-01
Cell cycle 22 0/0.1 (-) 9.01E-01 0/0.09 (-) 9.12E-01 0/0.22 (-) 8.06E-01 0/0.2 (-) 8.16E-01
Cholesterol biosynthesis 13 0/0.06 (-) 9.40E-01 0/0.05 (-) 9.47E-01 0/0.13 (-) 8.80E-01 0/0.12 (-) 8.87E-01
Circadian clock system 9 0/0.04 (-) 9.58E-01 0/0.04 (-) 9.63E-01 0/0.09 (-) 9.16E-01 0/0.08 (-) 9.20E-01
Cobalamin biosynthesis 1 0/0 (-) 9.95E-01 0/0 (-) 9.96E-01 0/0.01 (-) 9.90E-01 0/0.01 (-) 9.91E-01
Coenzyme A biosynthesis 6 0/0.03 (-) 9.72E-01 0/0.03 (-) 9.75E-01 0/0.06 (-) 9.43E-01 0/0.06 (-) 9.46E-01
Cortocotropin releasing factor receptor signaling pathway 30 0/0.14 (-) 8.68E-01 0/0.13 (-) 8.82E-01 0/0.29 (-) 7.45E-01 0/0.28 (-) 7.58E-01
Cysteine biosynthesis 1 0/0 (-) 9.95E-01 0/0 (-) 9.96E-01 0/0.01 (-) 9.90E-01 0/0.01 (-) 9.91E-01
Cytoskeletal regulation by Rho GTPase 98 0/0.46 (-) 6.29E-01 0/0.41 (-) 6.64E-01 0/0.96 (-) 3.82E-01 0/0.91 (-) 4.03E-01
De novo purine biosynthesis 34 0/0.16 (-) 8.52E-01 0/0.14 (-) 8.68E-01 0/0.33 (-) 7.17E-01 0/0.31 (-) 7.30E-01
De novo pyrimidine deoxyribonucleotide biosynthesis 19 0/0.09 (-) 9.14E-01 0/0.08 (-) 9.24E-01 0/0.19 (-) 8.30E-01 0/0.18 (-) 8.39E-01
De novo pyrmidine ribonucleotides biosythesis 18 0/0.08 (-) 9.18E-01 0/0.08 (-) 9.28E-01 0/0.18 (-) 8.38E-01 0/0.17 (-) 8.47E-01
DNA replication 21 0/0.1 (-) 9.06E-01 0/0.09 (-) 9.16E-01 0/0.21 (-) 8.14E-01 0/0.19 (-) 8.24E-01
187

EGF receptor signaling pathway 135 1/0.64 (+) 4.72E-01 1/0.56 (+) 4.31E-01 3/1.32 (+) 1.47E-01 3/1.25 (+) 1.30E-01
Endogenous_cannabinoid_signaling 24 0/0.11 (-) 8.93E-01 0/0.1 (-) 9.05E-01 0/0.24 (-) 7.90E-01 0/0.22 (-) 8.01E-01
Endothelin signaling pathway 91 0/0.43 (-) 6.50E-01 0/0.38 (-) 6.84E-01 1/0.89 (+) 5.91E-01 1/0.84 (+) 5.70E-01
FAS signaling pathway 36 1/0.17 (+) 1.56E-01 1/0.15 (+) 1.39E-01 1/0.35 (+) 2.97E-01 1/0.33 (+) 2.83E-01
FGF signaling pathway 124 0/0.59 (-) 5.56E-01 0/0.52 (-) 5.95E-01 2/1.21 (+) 3.43E-01 2/1.15 (+) 3.18E-01
Flavin biosynthesis 1 0/0 (-) 9.95E-01 0/0 (-) 9.96E-01 0/0.01 (-) 9.90E-01 0/0.01 (-) 9.91E-01
Folate biosynthesis 6 0/0.03 (-) 9.72E-01 0/0.03 (-) 9.75E-01 0/0.06 (-) 9.43E-01 0/0.06 (-) 9.46E-01
Formyltetrahydroformate biosynthesis 11 0/0.05 (-) 9.49E-01 0/0.05 (-) 9.55E-01 0/0.11 (-) 8.98E-01 0/0.1 (-) 9.03E-01
Fructose galactose metabolism 12 0/0.06 (-) 9.45E-01 0/0.05 (-) 9.51E-01 0/0.12 (-) 8.89E-01 0/0.11 (-) 8.95E-01
GABA-B_receptor_II_signaling 40 0/0.19 (-) 8.28E-01 0/0.17 (-) 8.46E-01 0/0.39 (-) 6.76E-01 0/0.37 (-) 6.91E-01
Gamma-aminobutyric acid synthesis 6 0/0.03 (-) 9.72E-01 0/0.03 (-) 9.75E-01 0/0.06 (-) 9.43E-01 0/0.06 (-) 9.46E-01
General transcription by RNA polymerase I 20 0/0.09 (-) 9.10E-01 0/0.08 (-) 9.20E-01 0/0.2 (-) 8.22E-01 0/0.18 (-) 8.31E-01
General transcription regulation 36 0/0.17 (-) 8.44E-01 0/0.15 (-) 8.61E-01 0/0.35 (-) 7.03E-01 0/0.33 (-) 7.17E-01
Glutamine glutamate conversion 4 0/0.02 (-) 9.81E-01 0/0.02 (-) 9.83E-01 0/0.04 (-) 9.62E-01 0/0.04 (-) 9.64E-01
Glycolysis 25 0/0.12 (-) 8.89E-01 0/0.1 (-) 9.01E-01 0/0.24 (-) 7.83E-01 0/0.23 (-) 7.94E-01
Hedgehog signaling pathway 25 1/0.12 (+) 1.11E-01 1/0.1 (+) 9.90E-02 1/0.24 (+) 2.17E-01 1/0.23 (+) 2.06E-01
Heme biosynthesis 13 0/0.06 (-) 9.40E-01 0/0.05 (-) 9.47E-01 0/0.13 (-) 8.80E-01 0/0.12 (-) 8.87E-01
Heterotrimeric G-protein signaling pathway-Gi alpha and Gs alpha 166 0/0.78 (-) 4.55E-01 0/0.69 (-) 4.99E-01 0/1.63 (-) 1.95E-01 0/1.53 (-) 2.14E-01
mediated pathway
Heterotrimeric G-protein signaling pathway-Gq alpha and Go alpha 134 0/0.63 (-) 5.30E-01 0/0.56 (-) 5.71E-01 1/1.31 (-) 6.22E-01 1/1.24 (-) 6.49E-01
mediated pathway
Heterotrimeric G-protein signaling pathway-rod outer segment 45 0/0.21 (-) 8.08E-01 0/0.19 (-) 8.29E-01 0/0.44 (-) 6.43E-01 0/0.42 (-) 6.59E-01
phototransduction
Histamine H1 receptor mediated signaling pathway 47 0/0.22 (-) 8.01E-01 0/0.2 (-) 8.22E-01 2/0.46 (+) 7.82E-02 2/0.43 (+) 7.08E-02
Histamine H2 receptor mediated signaling pathway 25 0/0.12 (-) 8.89E-01 0/0.1 (-) 9.01E-01 0/0.24 (-) 7.83E-01 0/0.23 (-) 7.94E-01
Histamine synthesis 2 0/0.01 (-) 9.91E-01 0/0.01 (-) 9.92E-01 0/0.02 (-) 9.81E-01 0/0.02 (-) 9.82E-01
Huntington disease 167 0/0.79 (-) 4.53E-01 0/0.7 (-) 4.97E-01 0/1.64 (-) 1.94E-01 0/1.54 (-) 2.12E-01
Hypoxia response via HIF activation 32 0/0.15 (-) 8.60E-01 0/0.13 (-) 8.75E-01 0/0.31 (-) 7.31E-01 0/0.3 (-) 7.44E-01
Insulin/IGF pathway-mitogen activated protein kinase kinase/MAP 35 0/0.17 (-) 8.48E-01 0/0.15 (-) 8.64E-01 0/0.34 (-) 7.10E-01 0/0.32 (-) 7.23E-01
kinase cascade
Insulin/IGF pathway-protein kinase B signaling cascade 89 0/0.42 (-) 6.56E-01 0/0.37 (-) 6.89E-01 0/0.87 (-) 4.17E-01 0/0.82 (-) 4.39E-01
Integrin signalling pathway 181 0/0.85 (-) 4.24E-01 0/0.75 (-) 4.69E-01 0/1.77 (-) 1.69E-01 0/1.67 (-) 1.86E-01
Interferon-gamma signaling pathway 29 0/0.14 (-) 8.72E-01 0/0.12 (-) 8.86E-01 1/0.28 (+) 2.47E-01 1/0.27 (+) 2.35E-01
Ionotropic glutamate receptor pathway 54 0/0.25 (-) 7.75E-01 0/0.23 (-) 7.98E-01 0/0.53 (-) 5.89E-01 0/0.5 (-) 6.07E-01
Isoleucine biosynthesis 3 0/0.01 (-) 9.86E-01 0/0.01 (-) 9.88E-01 0/0.03 (-) 9.71E-01 0/0.03 (-) 9.73E-01
Leucine biosynthesis 2 0/0.01 (-) 9.91E-01 0/0.01 (-) 9.92E-01 0/0.02 (-) 9.81E-01 0/0.02 (-) 9.82E-01
Lipoate_biosynthesis 2 0/0.01 (-) 9.91E-01 0/0.01 (-) 9.92E-01 0/0.02 (-) 9.81E-01 0/0.02 (-) 9.82E-01
Lysine biosynthesis 2 0/0.01 (-) 9.91E-01 0/0.01 (-) 9.92E-01 0/0.02 (-) 9.81E-01 0/0.02 (-) 9.82E-01
Mannose metabolism 7 0/0.03 (-) 9.67E-01 0/0.03 (-) 9.71E-01 0/0.07 (-) 9.34E-01 0/0.06 (-) 9.37E-01
Metabotropic glutamate receptor group I pathway 36 0/0.17 (-) 8.44E-01 0/0.15 (-) 8.61E-01 1/0.35 (+) 2.97E-01 1/0.33 (+) 2.83E-01
Metabotropic glutamate receptor group II pathway 51 0/0.24 (-) 7.86E-01 0/0.21 (-) 8.08E-01 0/0.5 (-) 6.06E-01 0/0.47 (-) 6.24E-01
188

Metabotropic glutamate receptor group III pathway 73 0/0.34 (-) 7.08E-01 0/0.3 (-) 7.37E-01 0/0.71 (-) 4.89E-01 0/0.67 (-) 5.09E-01
Methionine biosynthesis 1 0/0 (-) 9.95E-01 0/0 (-) 9.96E-01 0/0.01 (-) 9.90E-01 0/0.01 (-) 9.91E-01
Methylcitrate cycle 4 0/0.02 (-) 9.81E-01 0/0.02 (-) 9.83E-01 0/0.04 (-) 9.62E-01 0/0.04 (-) 9.64E-01
Methylmalonyl pathway 4 0/0.02 (-) 9.81E-01 0/0.02 (-) 9.83E-01 0/0.04 (-) 9.62E-01 0/0.04 (-) 9.64E-01
mRNA splicing 9 0/0.04 (-) 9.58E-01 0/0.04 (-) 9.63E-01 0/0.09 (-) 9.16E-01 0/0.08 (-) 9.20E-01
Muscarinic acetylcholine receptor 1 and 3 signaling pathway 61 0/0.29 (-) 7.49E-01 0/0.25 (-) 7.75E-01 1/0.6 (+) 4.50E-01 1/0.56 (+) 4.31E-01
Muscarinic acetylcholine receptor 2 and 4 signaling pathway 62 0/0.29 (-) 7.46E-01 0/0.26 (-) 7.72E-01 0/0.61 (-) 5.44E-01 0/0.57 (-) 5.63E-01
N-acetylglucosamine metabolism 7 0/0.03 (-) 9.67E-01 0/0.03 (-) 9.71E-01 0/0.07 (-) 9.34E-01 0/0.06 (-) 9.37E-01
Nicotinic acetylcholine receptor signaling pathway 97 0/0.46 (-) 6.32E-01 0/0.4 (-) 6.67E-01 1/0.95 (+) 6.14E-01 1/0.9 (+) 5.93E-01
O-antigen biosynthesis 4 0/0.02 (-) 9.81E-01 0/0.02 (-) 9.83E-01 0/0.04 (-) 9.62E-01 0/0.04 (-) 9.64E-01
Ornithine degradation 3 0/0.01 (-) 9.86E-01 0/0.01 (-) 9.88E-01 0/0.03 (-) 9.71E-01 0/0.03 (-) 9.73E-01
Oxidative stress response 60 1/0.28 (+) 2.47E-01 1/0.25 (+) 2.22E-01 2/0.59 (+) 1.18E-01 2/0.55 (+) 1.07E-01
Oxytocin receptor mediated signaling pathway 60 0/0.28 (-) 7.53E-01 0/0.25 (-) 7.78E-01 2/0.59 (+) 1.18E-01 2/0.55 (+) 1.07E-01
p53 pathway 113 0/0.53 (-) 5.86E-01 0/0.47 (-) 6.24E-01 1/1.11 (-) 6.96E-01 1/1.04 (-) 7.19E-01
p53 pathway by glucose deprivation 25 0/0.12 (-) 8.89E-01 0/0.1 (-) 9.01E-01 0/0.24 (-) 7.83E-01 0/0.23 (-) 7.94E-01
P53 pathway feedback loops 1 7 0/0.03 (-) 9.67E-01 0/0.03 (-) 9.71E-01 0/0.07 (-) 9.34E-01 0/0.06 (-) 9.37E-01
p53 pathway feedback loops 2 52 0/0.25 (-) 7.82E-01 0/0.22 (-) 8.05E-01 0/0.51 (-) 6.01E-01 0/0.48 (-) 6.18E-01
Parkinson disease 100 1/0.47 (+) 3.77E-01 1/0.42 (+) 3.42E-01 2/0.98 (+) 2.57E-01 2/0.92 (+) 2.36E-01
PDGF signaling pathway 159 2/0.75 (+) 1.73E-01 2/0.66 (+) 1.43E-01 4/1.56 (+) 7.23E-02 4/1.47 (+) 6.11E-02
Pentose phosphate pathway 8 0/0.04 (-) 9.63E-01 0/0.03 (-) 9.67E-01 0/0.08 (-) 9.25E-01 0/0.07 (-) 9.29E-01
Phenylalanine biosynthesis 3 0/0.01 (-) 9.86E-01 0/0.01 (-) 9.88E-01 0/0.03 (-) 9.71E-01 0/0.03 (-) 9.73E-01
Phenylethylamine degradation 10 0/0.05 (-) 9.54E-01 0/0.04 (-) 9.59E-01 1/0.1 (+) 9.33E-02 1/0.09 (+) 8.83E-02
PI3 kinase pathway 115 0/0.54 (-) 5.80E-01 0/0.48 (-) 6.18E-01 0/1.13 (-) 3.23E-01 0/1.06 (-) 3.44E-01
Plasminogen activating cascade 18 0/0.08 (-) 9.18E-01 0/0.08 (-) 9.28E-01 0/0.18 (-) 8.38E-01 0/0.17 (-) 8.47E-01
PLP biosynthesis 2 0/0.01 (-) 9.91E-01 0/0.01 (-) 9.92E-01 0/0.02 (-) 9.81E-01 0/0.02 (-) 9.82E-01
Proline biosynthesis 4 0/0.02 (-) 9.81E-01 0/0.02 (-) 9.83E-01 0/0.04 (-) 9.62E-01 0/0.04 (-) 9.64E-01
Purine metabolism 8 0/0.04 (-) 9.63E-01 0/0.03 (-) 9.67E-01 0/0.08 (-) 9.25E-01 0/0.07 (-) 9.29E-01
Pyridoxal phosphate salvage pathway 2 0/0.01 (-) 9.91E-01 0/0.01 (-) 9.92E-01 0/0.02 (-) 9.81E-01 0/0.02 (-) 9.82E-01
Pyrimidine Metabolism 15 0/0.07 (-) 9.32E-01 0/0.06 (-) 9.39E-01 0/0.15 (-) 8.63E-01 0/0.14 (-) 8.71E-01
Pyruvate metabolism 14 0/0.07 (-) 9.36E-01 0/0.06 (-) 9.43E-01 0/0.14 (-) 8.72E-01 0/0.13 (-) 8.79E-01
Ras Pathway 79 0/0.37 (-) 6.88E-01 0/0.33 (-) 7.19E-01 1/0.77 (+) 5.39E-01 1/0.73 (+) 5.19E-01
S adenosyl methionine biosynthesis 3 0/0.01 (-) 9.86E-01 0/0.01 (-) 9.88E-01 0/0.03 (-) 9.71E-01 0/0.03 (-) 9.73E-01
Salvage pyrimidine deoxyribonucleotides 4 0/0.02 (-) 9.81E-01 0/0.02 (-) 9.83E-01 0/0.04 (-) 9.62E-01 0/0.04 (-) 9.64E-01
Salvage pyrimidine ribonucleotides 14 0/0.07 (-) 9.36E-01 0/0.06 (-) 9.43E-01 0/0.14 (-) 8.72E-01 0/0.13 (-) 8.79E-01
Serine glycine biosynthesis 5 0/0.02 (-) 9.77E-01 0/0.02 (-) 9.79E-01 0/0.05 (-) 9.52E-01 0/0.05 (-) 9.55E-01
Succinate to proprionate conversion 2 0/0.01 (-) 9.91E-01 0/0.01 (-) 9.92E-01 0/0.02 (-) 9.81E-01 0/0.02 (-) 9.82E-01
Sulfate assimilation 2 0/0.01 (-) 9.91E-01 0/0.01 (-) 9.92E-01 0/0.02 (-) 9.81E-01 0/0.02 (-) 9.82E-01
Synaptic_vesicle_trafficking 38 0/0.18 (-) 8.36E-01 0/0.16 (-) 8.53E-01 0/0.37 (-) 6.89E-01 0/0.35 (-) 7.04E-01
TCA cycle 17 0/0.08 (-) 9.23E-01 0/0.07 (-) 9.32E-01 0/0.17 (-) 8.47E-01 0/0.16 (-) 8.55E-01
TGF-beta signaling pathway 145 1/0.68 (+) 4.97E-01 1/0.6 (+) 4.55E-01 1/1.42 (-) 5.84E-01 1/1.34 (-) 6.12E-01
Thiamine metabolism 2 0/0.01 (-) 9.91E-01 0/0.01 (-) 9.92E-01 0/0.02 (-) 9.81E-01 0/0.02 (-) 9.82E-01
189

Threonine biosynthesis 2 0/0.01 (-) 9.91E-01 0/0.01 (-) 9.92E-01 0/0.02 (-) 9.81E-01 0/0.02 (-) 9.82E-01
Thyrotropin-releasing hormone receptor signaling pathway 62 0/0.29 (-) 7.46E-01 0/0.26 (-) 7.72E-01 2/0.61 (+) 1.24E-01 2/0.57 (+) 1.13E-01
Transcription regulation by bZIP transcription factor 53 0/0.25 (-) 7.78E-01 0/0.22 (-) 8.02E-01 0/0.52 (-) 5.95E-01 0/0.49 (-) 6.12E-01
Triacylglycerol metabolism 1 0/0 (-) 9.95E-01 0/0 (-) 9.96E-01 0/0.01 (-) 9.90E-01 0/0.01 (-) 9.91E-01
Tyrosine biosynthesis 4 0/0.02 (-) 9.81E-01 0/0.02 (-) 9.83E-01 0/0.04 (-) 9.62E-01 0/0.04 (-) 9.64E-01
Ubiquitin proteasome pathway 70 0/0.33 (-) 7.18E-01 0/0.29 (-) 7.47E-01 1/0.69 (+) 4.97E-01 1/0.65 (+) 4.77E-01
Valine biosynthesis 3 0/0.01 (-) 9.86E-01 0/0.01 (-) 9.88E-01 0/0.03 (-) 9.71E-01 0/0.03 (-) 9.73E-01
VEGF signaling pathway 75 0/0.35 (-) 7.01E-01 0/0.31 (-) 7.31E-01 2/0.73 (+) 1.68E-01 2/0.69 (+) 1.53E-01
Vitamin B6 metabolism 3 0/0.01 (-) 9.86E-01 0/0.01 (-) 9.88E-01 0/0.03 (-) 9.71E-01 0/0.03 (-) 9.73E-01
Vitamin D metabolism and pathway 14 0/0.07 (-) 9.36E-01 0/0.06 (-) 9.43E-01 1/0.14 (+) 1.28E-01 1/0.13 (+) 1.21E-01
Wnt signaling pathway 317 0/1.5 (-) 2.21E-01 0/1.32 (-) 2.64E-01 1/3.1 (-) 1.82E-01 1/2.93 (-) 2.08E-01
Xanthine and guanine salvage pathway 6 0/0.03 (-) 9.72E-01 0/0.03 (-) 9.75E-01 0/0.06 (-) 9.43E-01 0/0.06 (-) 9.46E-01
Chart Title
5
4.5 7.00%
H. Sapiens Reference
4
Confirmed
3.5
Confirmed (p <= 0.05)
6.00%
3 Confirmed no HLA
2.5 Confirmed no HLA (p <= 0.05)

2 Expanded
5.00%
1.5 Expanded (p <= 0.05)
Expanded no HLA
1
Expanded no HLA (p <= 0.05)
0.5 4.00%

3.00%

2.00%

1.00%

0.00%
MHC protein complex protein complex

Supplementary Figure 5 PANTHER cellular component analysis graph full results. For clarity only cellular
components significant (p ≤ 0.05) in one or more gene lists are shown. A lighter colour indicates that the
cellular component is significant for that gene list.

Chart Title
5 14.00%

4.5
H. Sapiens Reference
4 12.00%
Confirmed
3.5
Confirmed (p <= 0.05)
3 10.00% Confirmed no HLA
2.5 Confirmed no HLA (p <= 0.05)

2 8.00% Expanded

1.5 Expanded (p <= 0.05)


Expanded no HLA
1 6.00%
Expanded no HLA (p <= 0.05)
0.5
0 4.00%

2.00%

0.00%

Supplementary Figure 6 PANTHER protein class analysis graph full results. For clarity only protein classes
significant (p ≤ 0.05) in one or more gene lists are shown. A lighter colour indicates that the protein class is
significant for that gene list.

190
Chart Title
5
5.00%
4.5
H. Sapiens Reference
4
4.50% Confirmed
3.5
Confirmed (p <= 0.05)
34.00% Confirmed no HLA
2.5 Confirmed no HLA (p <= 0.05)

23.50% Expanded

1.5 Expanded (p <= 0.05)


Expanded no HLA
13.00%
Expanded no HLA (p <= 0.05)
0.5
2.50%
0

2.00%

1.50%

1.00%

0.50%

0.00%
Angiogenesis B cell Inflammation Interleukin JAK/STAT Notch T cell Toll receptor Vasopressin
activation mediated by signaling signaling signaling activation signaling synthesis
chemokine pathway pathway pathway pathway
and cytokine
signaling
pathway

Supplementary Figure 7 PANTHER pathway analysis graph full results. For clarity only pathways significant (p
≤ 0.05) in one or more gene lists are shown. A lighter colour indicates that the pathway is significant for that
gene list.

191
7.1.1.5 Reactome
The Reactome database failed to map the largest proportion of genes across all databases
used, achieving a maximum mapped proportion of 15.79% (Supplementary Table 22). Full
results from the Reactome analyses are shown in Supplementary Table 23, Supplementary
Table 24, Supplementary Table 25 and Supplementary Table 26. Exclusion of the MHC
region genes resulted in 39 of the pathways identified by the confirmed gene list failing to
reach significance and 1 pathway, ‘Chemokine receptors bind chemokines’, attaining
significance (Supplementary Table 24). The top 5 significant pathways all involve SHC1,
share the same significance (p = 4.60 x 10-3) and involve two genes: IL2 and IL2RA. The
pathway involved in signalling in the immune system remains the pathway with the most
gene representation; however, the number of genes associated with this pathway dropped
to 9. With the exception of 3 pathways, exclusion of the MHC region genes resulted in an
increase in significance of the overlapping pathways.

Following exclusion of the MHC region genes from the expanded gene list, 31 pathways
remained significant and 5 pathways attained significance resulting in 42 significant
pathways (Supplementary Table 26). Signalling in the immune system remained the most
significant pathway (p = 2.19 x 10-6) although the number of genes involved reduced slightly
from 26 to 23. Overall, 86 unique pathways were identified across the four gene lists
(Supplementary Table 27).

Supplementary Table 22 Ensembl gene IDs unrecognised by Reactome


Confirmed Confirmed no HLA Expanded Expanded no HLA
ENSG00000026297 ENSG00000026297 ENSG00000026297 ENSG00000026297
ENSG00000056558 ENSG00000056558 ENSG00000056558 ENSG00000056558
ENSG00000064419 ENSG00000064419 ENSG00000064419 ENSG00000064419
ENSG00000081019 ENSG00000081019 ENSG00000073605 ENSG00000073605
ENSG00000081026 ENSG00000081026 ENSG00000079999 ENSG00000079999
ENSG00000111087 ENSG00000111087 ENSG00000081019 ENSG00000081019
ENSG00000115464 ENSG00000115464 ENSG00000081026 ENSG00000081026
ENSG00000116774 ENSG00000116774 ENSG00000089022 ENSG00000089022
ENSG00000116793 ENSG00000116793 ENSG00000089234 ENSG00000089234
ENSG00000118655 ENSG00000118655 ENSG00000089248 ENSG00000089248
ENSG00000119396 ENSG00000119396 ENSG00000100027 ENSG00000100027
ENSG00000119403 ENSG00000119403 ENSG00000100379 ENSG00000100379
ENSG00000120436 ENSG00000120436 ENSG00000105364 ENSG00000105364
ENSG00000122733 ENSG00000122733 ENSG00000105376 ENSG00000105376
ENSG00000124160 ENSG00000124160 ENSG00000105401 ENSG00000105401
ENSG00000130363 ENSG00000130363 ENSG00000111087 ENSG00000111087
ENSG00000134242 ENSG00000134242 ENSG00000111249 ENSG00000111249
ENSG00000134262 ENSG00000134262 ENSG00000111271 ENSG00000111271
ENSG00000134453 ENSG00000134453 ENSG00000111300 ENSG00000111300
ENSG00000135253 ENSG00000135253 ENSG00000115464 ENSG00000115464
ENSG00000135454 ENSG00000135454 ENSG00000116774 ENSG00000116774
ENSG00000135506 ENSG00000135506 ENSG00000116793 ENSG00000116793
ENSG00000136068 ENSG00000136068 ENSG00000118655 ENSG00000118655
ENSG00000136573 ENSG00000136573 ENSG00000119396 ENSG00000119396

192
ENSG00000138378 ENSG00000138378 ENSG00000119403 ENSG00000119403
ENSG00000138684 ENSG00000138684 ENSG00000120436 ENSG00000120436
ENSG00000138688 ENSG00000138688 ENSG00000122733 ENSG00000122733
ENSG00000139269 ENSG00000139269 ENSG00000124160 ENSG00000124160
ENSG00000143061 ENSG00000143061 ENSG00000128228 ENSG00000128228
ENSG00000144218 ENSG00000144218 ENSG00000128309 ENSG00000128309
ENSG00000145723 ENSG00000145723 ENSG00000128311 ENSG00000128311
ENSG00000145725 ENSG00000145725 ENSG00000130363 ENSG00000130363
ENSG00000145730 ENSG00000145730 ENSG00000131748 ENSG00000131748
ENSG00000154319 ENSG00000154319 ENSG00000133475 ENSG00000133475
ENSG00000158457 ENSG00000158457 ENSG00000134242 ENSG00000134242
ENSG00000162924 ENSG00000162924 ENSG00000134262 ENSG00000134262
ENSG00000162927 ENSG00000162927 ENSG00000134453 ENSG00000134453
ENSG00000162928 ENSG00000162928 ENSG00000135148 ENSG00000135148
ENSG00000162929 ENSG00000162929 ENSG00000135253 ENSG00000135253
ENSG00000163349 ENSG00000163349 ENSG00000135362 ENSG00000135362
ENSG00000163684 ENSG00000163684 ENSG00000135454 ENSG00000135454
ENSG00000163686 ENSG00000163686 ENSG00000135506 ENSG00000135506
ENSG00000163687 ENSG00000163687 ENSG00000136068 ENSG00000136068
ENSG00000164113 ENSG00000164113 ENSG00000136573 ENSG00000136573
ENSG00000164512 ENSG00000164512 ENSG00000138378 ENSG00000138378
ENSG00000166908 ENSG00000166908 ENSG00000138684 ENSG00000138684
ENSG00000166984 ENSG00000166984 ENSG00000138688 ENSG00000138688
ENSG00000166987 ENSG00000166987 ENSG00000139269 ENSG00000139269
ENSG00000168297 ENSG00000168297 ENSG00000141741 ENSG00000141741
ENSG00000168301 ENSG00000168301 ENSG00000143061 ENSG00000143061
ENSG00000168309 ENSG00000168309 ENSG00000144218 ENSG00000144218
ENSG00000168394 ENSG00000170500 ENSG00000145723 ENSG00000145723
ENSG00000170500 ENSG00000170983 ENSG00000145725 ENSG00000145725
ENSG00000170983 ENSG00000173209 ENSG00000145730 ENSG00000145730
ENSG00000173209 ENSG00000175749 ENSG00000154319 ENSG00000154319
ENSG00000175749 ENSG00000178498 ENSG00000154822 ENSG00000154822
ENSG00000178498 ENSG00000178723 ENSG00000156127 ENSG00000156127
ENSG00000178723 ENSG00000181751 ENSG00000158457 ENSG00000158457
ENSG00000179344 ENSG00000184608 ENSG00000160183 ENSG00000160183
ENSG00000181751 ENSG00000187791 ENSG00000160185 ENSG00000160185
ENSG00000184608 ENSG00000188761 ENSG00000160712 ENSG00000160712
ENSG00000187791 ENSG00000197146 ENSG00000160714 ENSG00000160714
ENSG00000188761 ENSG00000198369 ENSG00000161179 ENSG00000161179
ENSG00000196301 ENSG00000198643 ENSG00000161180 ENSG00000161180
ENSG00000196735 ENSG00000201076 ENSG00000161395 ENSG00000161395
ENSG00000197146 ENSG00000201581 ENSG00000161405 ENSG00000161405
ENSG00000198369 ENSG00000203711 ENSG00000161847 ENSG00000161847
ENSG00000198643 ENSG00000203864 ENSG00000162924 ENSG00000162924
ENSG00000201076 ENSG00000204929 ENSG00000162927 ENSG00000162927
ENSG00000201581 ENSG00000205108 ENSG00000162928 ENSG00000162928
ENSG00000203711 ENSG00000206937 ENSG00000162929 ENSG00000162929
ENSG00000203864 ENSG00000206970 ENSG00000163239 ENSG00000163239
ENSG00000204267 ENSG00000206973 ENSG00000163349 ENSG00000163349
ENSG00000204287 ENSG00000207402 ENSG00000163684 ENSG00000163684
ENSG00000204290 ENSG00000208028 ENSG00000163686 ENSG00000163686
ENSG00000204296 ENSG00000211543 ENSG00000163687 ENSG00000163687
ENSG00000204929 ENSG00000211573 ENSG00000164113 ENSG00000164113
ENSG00000205108 ENSG00000212978 ENSG00000164512 ENSG00000164512
ENSG00000206937 ENSG00000213076 ENSG00000166349 ENSG00000166349
ENSG00000206970 ENSG00000213820 ENSG00000166352 ENSG00000166352
ENSG00000206973 ENSG00000213925 ENSG00000166908 ENSG00000166908
ENSG00000207402 ENSG00000214015 ENSG00000166984 ENSG00000166984
ENSG00000208028 ENSG00000215199 ENSG00000166987 ENSG00000166987
ENSG00000211543 ENSG00000215204 ENSG00000167807 ENSG00000167807
ENSG00000211573 ENSG00000219463 ENSG00000167914 ENSG00000167914
ENSG00000212066 ENSG00000221401 ENSG00000168297 ENSG00000168297
ENSG00000212978 ENSG00000222251 ENSG00000168301 ENSG00000168301
ENSG00000213076 ENSG00000223003 ENSG00000168309 ENSG00000168309
ENSG00000213820 ENSG00000223191 ENSG00000168394 ENSG00000168488
ENSG00000213925 ENSG00000223489 ENSG00000168488 ENSG00000169291
ENSG00000214015 ENSG00000223895 ENSG00000169291 ENSG00000169635
ENSG00000214861 ENSG00000224163 ENSG00000169635 ENSG00000169662
ENSG00000215199 ENSG00000224478 ENSG00000169662 ENSG00000169668
ENSG00000215204 ENSG00000224713 ENSG00000169668 ENSG00000169682
ENSG00000219463 ENSG00000224791 ENSG00000169682 ENSG00000170500

193
ENSG00000221401 ENSG00000224890 ENSG00000170500 ENSG00000170983
ENSG00000222251 ENSG00000226004 ENSG00000170983 ENSG00000171532
ENSG00000223003 ENSG00000226032 ENSG00000171532 ENSG00000172057
ENSG00000223191 ENSG00000226167 ENSG00000172057 ENSG00000173064
ENSG00000223335 ENSG00000226884 ENSG00000173064 ENSG00000173209
ENSG00000223489 ENSG00000227145 ENSG00000173209 ENSG00000175097
ENSG00000223534 ENSG00000227598 ENSG00000175097 ENSG00000175749
ENSG00000223895 ENSG00000227943 ENSG00000175749 ENSG00000176046
ENSG00000224163 ENSG00000228414 ENSG00000176046 ENSG00000176476
ENSG00000224478 ENSG00000229260 ENSG00000176476 ENSG00000176953
ENSG00000224713 ENSG00000229664 ENSG00000176953 ENSG00000177548
ENSG00000224791 ENSG00000229922 ENSG00000177548 ENSG00000178498
ENSG00000224890 ENSG00000230359 ENSG00000178498 ENSG00000178723
ENSG00000225914 ENSG00000230393 ENSG00000178723 ENSG00000178952
ENSG00000226004 ENSG00000230533 ENSG00000178952 ENSG00000181751
ENSG00000226030 ENSG00000230626 ENSG00000179344 ENSG00000183246
ENSG00000226032 ENSG00000231072 ENSG00000181751 ENSG00000183506
ENSG00000226167 ENSG00000231128 ENSG00000183246 ENSG00000183621
ENSG00000226884 ENSG00000231634 ENSG00000183506 ENSG00000184608
ENSG00000227145 ENSG00000231858 ENSG00000183621 ENSG00000184730
ENSG00000227598 ENSG00000232067 ENSG00000184608 ENSG00000185264
ENSG00000227943 ENSG00000232084 ENSG00000184730 ENSG00000185651
ENSG00000228414 ENSG00000232450 ENSG00000185264 ENSG00000186075
ENSG00000228962 ENSG00000232547 ENSG00000185651 ENSG00000187045
ENSG00000229260 ENSG00000232693 ENSG00000186075 ENSG00000187791
ENSG00000229391 ENSG00000232713 ENSG00000187045 ENSG00000188322
ENSG00000229664 ENSG00000233031 ENSG00000187791 ENSG00000188603
ENSG00000229922 ENSG00000233459 ENSG00000188322 ENSG00000188761
ENSG00000230359 ENSG00000234255 ENSG00000188603 ENSG00000196347
ENSG00000230393 ENSG00000234624 ENSG00000188761 ENSG00000196585
ENSG00000230533 ENSG00000235527 ENSG00000196301 ENSG00000196934
ENSG00000230626 ENSG00000235842 ENSG00000196347 ENSG00000196993
ENSG00000231072 ENSG00000236722 ENSG00000196585 ENSG00000197146
ENSG00000231128 ENSG00000237024 ENSG00000196735 ENSG00000197210
ENSG00000231634 ENSG00000237499 ENSG00000196934 ENSG00000197272
ENSG00000231858 ENSG00000237522 ENSG00000196993 ENSG00000198156
ENSG00000232067 ENSG00000237651 ENSG00000197146 ENSG00000198270
ENSG00000232080 ENSG00000237969 ENSG00000197210 ENSG00000198324
ENSG00000232084 ENSG00000238256 ENSG00000197272 ENSG00000198369
ENSG00000232450 ENSG00000238436 ENSG00000198156 ENSG00000198567
ENSG00000232547 ENSG00000238532 ENSG00000198270 ENSG00000198643
ENSG00000232629 ENSG00000238733 ENSG00000198324 ENSG00000200057
ENSG00000232693 ENSG00000240416 ENSG00000198369 ENSG00000200135
ENSG00000232713 ENSG00000240771 ENSG00000198567 ENSG00000200688
ENSG00000233031 ENSG00000240825 ENSG00000198643 ENSG00000201076
ENSG00000233459 ENSG00000241102 ENSG00000200057 ENSG00000201078
ENSG00000234255 ENSG00000241524 ENSG00000200135 ENSG00000201428
ENSG00000234515 ENSG00000241573 ENSG00000200688 ENSG00000201466
ENSG00000234624 ENSG00000242162 ENSG00000201076 ENSG00000201581
ENSG00000235040 ENSG00000242241 ENSG00000201078 ENSG00000203711
ENSG00000235301 ENSG00000242309 ENSG00000201428 ENSG00000203864
ENSG00000235527 ENSG00000242990 ENSG00000201466 ENSG00000204842
ENSG00000235842 ENSG00000243384 ENSG00000201581 ENSG00000204913
ENSG00000236722 ENSG00000243592 ENSG00000203711 ENSG00000204929
ENSG00000237024 ENSG00000243738 ENSG00000203864 ENSG00000205108
ENSG00000237285 ENSG00000244161 ENSG00000204267 ENSG00000205609
ENSG00000237499 ENSG00000244383 ENSG00000204287 ENSG00000206140
ENSG00000237522 ENSG00000244498 ENSG00000204290 ENSG00000206142
ENSG00000237541 ENSG00000244556 ENSG00000204296 ENSG00000206763
ENSG00000237651 ENSG00000245247 ENSG00000204842 ENSG00000206937
ENSG00000237969 ENSG00000247039 ENSG00000204913 ENSG00000206970
ENSG00000238256 ENSG00000248203 ENSG00000204929 ENSG00000206973
ENSG00000238436 ENSG00000248224 ENSG00000205108 ENSG00000207402
ENSG00000238532 ENSG00000248452 ENSG00000205609 ENSG00000207751
ENSG00000238733 ENSG00000248549 ENSG00000206140 ENSG00000207759
ENSG00000240065 ENSG00000249141 ENSG00000206142 ENSG00000207975
ENSG00000240416 ENSG00000251922 ENSG00000206763 ENSG00000208028
ENSG00000240771 ENSG00000252865 ENSG00000206937 ENSG00000211543
ENSG00000240825 ENSG00000253032 ENSG00000206970 ENSG00000211573
ENSG00000241102 ENSG00000254371 ENSG00000206973 ENSG00000212102
ENSG00000241106 ENSG00000254774 ENSG00000207402 ENSG00000212743

194
ENSG00000241287 ENSG00000255154 ENSG00000207751 ENSG00000212978
ENSG00000241524 ENSG00000255354 ENSG00000207759 ENSG00000213076
ENSG00000241573 ENSG00000255518 ENSG00000207975 ENSG00000213152
ENSG00000242162 ENSG00000208028 ENSG00000213156
ENSG00000242241 ENSG00000211543 ENSG00000213820
ENSG00000242309 ENSG00000211573 ENSG00000213925
ENSG00000242574 ENSG00000212066 ENSG00000214015
ENSG00000242990 ENSG00000212102 ENSG00000214546
ENSG00000243054 ENSG00000212743 ENSG00000215199
ENSG00000243384 ENSG00000212978 ENSG00000215204
ENSG00000243592 ENSG00000213076 ENSG00000215403
ENSG00000243738 ENSG00000213152 ENSG00000215498
ENSG00000244161 ENSG00000213156 ENSG00000219463
ENSG00000244383 ENSG00000213820 ENSG00000220201
ENSG00000244498 ENSG00000213925 ENSG00000221386
ENSG00000244556 ENSG00000214015 ENSG00000221401
ENSG00000245247 ENSG00000214546 ENSG00000221566
ENSG00000247039 ENSG00000214861 ENSG00000222251
ENSG00000247909 ENSG00000215199 ENSG00000222352
ENSG00000248203 ENSG00000215204 ENSG00000223003
ENSG00000248224 ENSG00000215403 ENSG00000223191
ENSG00000248452 ENSG00000215498 ENSG00000223489
ENSG00000248549 ENSG00000219463 ENSG00000223881
ENSG00000248993 ENSG00000220201 ENSG00000223895
ENSG00000249141 ENSG00000221386 ENSG00000224163
ENSG00000250264 ENSG00000221401 ENSG00000224478
ENSG00000251916 ENSG00000221566 ENSG00000224688
ENSG00000251922 ENSG00000222251 ENSG00000224713
ENSG00000252865 ENSG00000222352 ENSG00000224791
ENSG00000253032 ENSG00000223003 ENSG00000224890
ENSG00000254371 ENSG00000223191 ENSG00000225172
ENSG00000254774 ENSG00000223335 ENSG00000226004
ENSG00000255154 ENSG00000223489 ENSG00000226032
ENSG00000255354 ENSG00000223534 ENSG00000226117
ENSG00000255518 ENSG00000223881 ENSG00000226167
ENSG00000223895 ENSG00000226441
ENSG00000224163 ENSG00000226469
ENSG00000224478 ENSG00000226534
ENSG00000224688 ENSG00000226728
ENSG00000224713 ENSG00000226884
ENSG00000224791 ENSG00000226885
ENSG00000224890 ENSG00000227145
ENSG00000225172 ENSG00000227598
ENSG00000225914 ENSG00000227747
ENSG00000226004 ENSG00000227943
ENSG00000226030 ENSG00000228013
ENSG00000226032 ENSG00000228264
ENSG00000226117 ENSG00000228414
ENSG00000226167 ENSG00000228910
ENSG00000226441 ENSG00000229186
ENSG00000226469 ENSG00000229260
ENSG00000226534 ENSG00000229266
ENSG00000226728 ENSG00000229664
ENSG00000226884 ENSG00000229780
ENSG00000226885 ENSG00000229922
ENSG00000227145 ENSG00000229933
ENSG00000227598 ENSG00000229989
ENSG00000227747 ENSG00000230359
ENSG00000227943 ENSG00000230393
ENSG00000228013 ENSG00000230533
ENSG00000228264 ENSG00000230626
ENSG00000228414 ENSG00000231072
ENSG00000228910 ENSG00000231128
ENSG00000228962 ENSG00000231467
ENSG00000229186 ENSG00000231634
ENSG00000229260 ENSG00000231718
ENSG00000229266 ENSG00000231858
ENSG00000229391 ENSG00000232067
ENSG00000229664 ENSG00000232084
ENSG00000229780 ENSG00000232450
ENSG00000229922 ENSG00000232547

195
ENSG00000229933 ENSG00000232693
ENSG00000229989 ENSG00000232713
ENSG00000230359 ENSG00000232771
ENSG00000230393 ENSG00000233031
ENSG00000230533 ENSG00000233232
ENSG00000230626 ENSG00000233410
ENSG00000231072 ENSG00000233411
ENSG00000231128 ENSG00000233459
ENSG00000231467 ENSG00000233875
ENSG00000231634 ENSG00000234255
ENSG00000231718 ENSG00000234503
ENSG00000231858 ENSG00000234545
ENSG00000232067 ENSG00000234608
ENSG00000232080 ENSG00000234624
ENSG00000232084 ENSG00000235237
ENSG00000232450 ENSG00000235492
ENSG00000232547 ENSG00000235527
ENSG00000232629 ENSG00000235842
ENSG00000232693 ENSG00000236278
ENSG00000232713 ENSG00000236722
ENSG00000232771 ENSG00000237024
ENSG00000233031 ENSG00000237135
ENSG00000233232 ENSG00000237407
ENSG00000233410 ENSG00000237499
ENSG00000233411 ENSG00000237522
ENSG00000233459 ENSG00000237651
ENSG00000233875 ENSG00000237819
ENSG00000234255 ENSG00000237868
ENSG00000234503 ENSG00000237969
ENSG00000234515 ENSG00000238168
ENSG00000234545 ENSG00000238256
ENSG00000234608 ENSG00000238352
ENSG00000234624 ENSG00000238436
ENSG00000235040 ENSG00000238532
ENSG00000235237 ENSG00000238684
ENSG00000235301 ENSG00000238703
ENSG00000235492 ENSG00000238733
ENSG00000235527 ENSG00000239071
ENSG00000235842 ENSG00000239294
ENSG00000236278 ENSG00000239511
ENSG00000236722 ENSG00000240416
ENSG00000237024 ENSG00000240634
ENSG00000237135 ENSG00000240771
ENSG00000237285 ENSG00000240825
ENSG00000237407 ENSG00000240906
ENSG00000237499 ENSG00000241037
ENSG00000237522 ENSG00000241102
ENSG00000237541 ENSG00000241524
ENSG00000237651 ENSG00000241573
ENSG00000237819 ENSG00000242162
ENSG00000237868 ENSG00000242241
ENSG00000237969 ENSG00000242309
ENSG00000238168 ENSG00000242591
ENSG00000238256 ENSG00000242990
ENSG00000238352 ENSG00000243233
ENSG00000238436 ENSG00000243384
ENSG00000238532 ENSG00000243592
ENSG00000238684 ENSG00000243738
ENSG00000238703 ENSG00000244060
ENSG00000238733 ENSG00000244161
ENSG00000239071 ENSG00000244383
ENSG00000239294 ENSG00000244498
ENSG00000239511 ENSG00000244556
ENSG00000240065 ENSG00000244671
ENSG00000240416 ENSG00000245192
ENSG00000240634 ENSG00000245247
ENSG00000240771 ENSG00000245682
ENSG00000240825 ENSG00000246465
ENSG00000240906 ENSG00000247039
ENSG00000241037 ENSG00000248203
ENSG00000241102 ENSG00000248224

196
ENSG00000241106 ENSG00000248452
ENSG00000241287 ENSG00000248549
ENSG00000241524 ENSG00000249141
ENSG00000241573 ENSG00000249524
ENSG00000242162 ENSG00000249860
ENSG00000242241 ENSG00000251417
ENSG00000242309 ENSG00000251922
ENSG00000242574 ENSG00000252020
ENSG00000242591 ENSG00000252142
ENSG00000242990 ENSG00000252143
ENSG00000243054 ENSG00000252314
ENSG00000243233 ENSG00000252402
ENSG00000243384 ENSG00000252412
ENSG00000243592 ENSG00000252461
ENSG00000243738 ENSG00000252605
ENSG00000244060 ENSG00000252619
ENSG00000244161 ENSG00000252799
ENSG00000244383 ENSG00000252865
ENSG00000244498 ENSG00000252976
ENSG00000244556 ENSG00000253032
ENSG00000244671 ENSG00000253080
ENSG00000245192 ENSG00000254371
ENSG00000245247 ENSG00000254498
ENSG00000245682 ENSG00000254774
ENSG00000246465 ENSG00000255154
ENSG00000247039 ENSG00000255354
ENSG00000247909 ENSG00000255518
ENSG00000248203 ENSG00000255524
ENSG00000248224
ENSG00000248452
ENSG00000248549
ENSG00000248993
ENSG00000249141
ENSG00000249524
ENSG00000249860
ENSG00000250264
ENSG00000251417
ENSG00000251916
ENSG00000251922
ENSG00000252020
ENSG00000252142
ENSG00000252143
ENSG00000252314
ENSG00000252402
ENSG00000252412
ENSG00000252461
ENSG00000252605
ENSG00000252619
ENSG00000252799
ENSG00000252865
ENSG00000252976
ENSG00000253032
ENSG00000253080
ENSG00000254371
ENSG00000254498
ENSG00000254774
ENSG00000255154
ENSG00000255354
ENSG00000255518
ENSG00000255524

197
Supplementary Table 23 Full significant Reactome events using the confirmed gene list.
FDR Un-adjusted Query Total Identifier of Name of this Event Submitted identifiers mapping to this Event
probability Genes Genes this Event
- 0.0001106 5 69 REACT_19344 Costimulation by the CD28 family ENSG00000198502, ENSG00000163600, ENSG00000178562,
ENSG00000163599, ENSG00000196126
- 0.0008785 12 630 REACT_6900 Signaling in Immune system ENSG00000198502, ENSG00000163600, ENSG00000128604,
ENSG00000196126, ENSG00000109471, ENSG00000101017,
ENSG00000106804, ENSG00000118503, ENSG00000134460,
ENSG00000163599, ENSG00000178562, ENSG00000204264
- 0.0026121 2 11 REACT_14814 Formation of CSL-NICD coactivator complex ENSG00000204301, ENSG00000168214
- 0.0026121 2 11 REACT_14835 Notch-HLH transcription pathway ENSG00000204301, ENSG00000168214
- 0.006288 2 17 REACT_23856 Phosphorylated SHC1 recruits GRB2:GAB2 ENSG00000134460, ENSG00000109471
- 0.006288 2 17 REACT_23828 Phosphorylated SHC recruits GRB2:SOS1 ENSG00000134460, ENSG00000109471
- 0.006288 2 17 REACT_23782 SHC1 mediates cytokine-induced phosphorylation of GAB2 ENSG00000134460, ENSG00000109471
- 0.006288 2 17 REACT_23874 Phosphorylated SHC1 recruits SHIP ENSG00000134460, ENSG00000109471
- 0.006288 2 17 REACT_23911 The SHC1:SHIP1 complex is stabilized by GRB2 ENSG00000134460, ENSG00000109471
- 0.0070425 2 18 REACT_23928 SOS1 activates H-Ras ENSG00000134460, ENSG00000109471
- 0.0094277 3 62 REACT_15364 Loss of Nlp from mitotic centrosomes ENSG00000213066, ENSG00000119397, ENSG00000175203
- 0.0094277 3 62 REACT_15386 Plk1-mediated phosphorylation of Nlp ENSG00000213066, ENSG00000119397, ENSG00000175203
- 0.0094277 3 62 REACT_15451 Loss of proteins required for interphase microtubule organization from ENSG00000213066, ENSG00000119397, ENSG00000175203
198

the centrosome
- 0.0094277 3 62 REACT_15313 Loss of C-Nap-1 from centrosomes ENSG00000213066, ENSG00000119397, ENSG00000175203
- 0.0094277 3 62 REACT_15440 Dissociation of Phospho-Nlp from the centrosome ENSG00000213066, ENSG00000119397, ENSG00000175203
- 0.0094277 3 62 REACT_15470 Recruitment of Plk1 to centrosomes ENSG00000213066, ENSG00000119397, ENSG00000175203
- 0.0094277 3 62 REACT_15401 Recruitment of CDK11p58 to the centrosomes ENSG00000213066, ENSG00000119397, ENSG00000175203
- 0.0113889 2 23 REACT_24024 Gab2 binds the p85 subunit of Class 1A PI3 kinases ENSG00000134460, ENSG00000109471
- 0.0121352 3 68 REACT_15467 Recruitment of additional gamma tubulin/ gamma TuRC to the centrosome ENSG00000213066, ENSG00000119397, ENSG00000175203
- 0.0131272 3 70 REACT_15296 Recruitment of mitotic centrosome proteins and complexes ENSG00000213066, ENSG00000119397, ENSG00000175203
- 0.0131272 3 70 REACT_15479 Centrosome maturation ENSG00000213066, ENSG00000119397, ENSG00000175203
- 0.0177951 2 29 REACT_23891 Interleukin receptor SHC signaling ENSG00000134460, ENSG00000109471
- 0.018114 6 301 REACT_152 Cell Cycle, Mitotic ENSG00000204261, ENSG00000213066, ENSG00000119397,
ENSG00000082898, ENSG00000175203, ENSG00000204264
- 0.0194015 3 81 REACT_2203 G2/M Transition ENSG00000213066, ENSG00000119397, ENSG00000175203
- 0.0201078 4 148 REACT_25229 Interferon Signaling ENSG00000198502, ENSG00000128604, ENSG00000204264,
ENSG00000196126
- 0.0213559 3 84 REACT_21391 Mitotic G2-G2/M phases ENSG00000213066, ENSG00000119397, ENSG00000175203
- 0.0214523 2 32 REACT_12633 Phosphorylation of ITAM motifs in CD3 complexes ENSG00000198502, ENSG00000196126
- 0.0214523 2 32 REACT_12467 Activation of Lck ENSG00000198502, ENSG00000196126
- 0.0227173 3 86 REACT_24994 Regulation of mRNA Stability by Proteins that Bind AU-rich Elements ENSG00000204261, ENSG00000082898, ENSG00000204264
- 0.0227353 2 33 REACT_12538 Phosphorylation of ZAP-70 by Lck ENSG00000198502, ENSG00000196126
- 0.0227353 2 33 REACT_12394 Activation of ZAP-70 ENSG00000198502, ENSG00000196126
- 0.0227353 2 33 REACT_12446 Dephosphorylation of Lck-pY505 by CD45 ENSG00000198502, ENSG00000196126
- 0.0227353 2 33 REACT_12566 Change of PKC theta conformation ENSG00000198502, ENSG00000196126
- 0.0227353 2 33 REACT_12642 Recruitment of ZAP-70 to phosphorylated ITAMs ENSG00000198502, ENSG00000196126
- 0.0227353 2 33 REACT_12596 Translocation of ZAP-70 to Immunological synapse ENSG00000198502, ENSG00000196126
- 0.0240495 2 34 REACT_12640 Inactivation of Lck by Csk ENSG00000198502, ENSG00000196126
- 0.0240495 2 34 REACT_12421 Phosphorylation of TBSMs in LAT ENSG00000198502, ENSG00000196126
- 0.0253943 2 35 REACT_12582 Phosphorylation of CD3 and TCR zeta chains ENSG00000198502, ENSG00000196126
- 0.0267693 2 36 REACT_12615 Phosphorylation of SLP-76 ENSG00000198502, ENSG00000196126
- 0.0270817 3 92 REACT_25082 Expression of IFNG-stimulated genes ENSG00000198502, ENSG00000128604, ENSG00000196126
- 0.0296079 2 38 REACT_12498 Phosphorylation of PLC-gamma1 ENSG00000198502, ENSG00000196126
- 0.0296079 2 38 REACT_19146 Dephosphorylation of CD3-zeta by PD-1 bound phosphatases ENSG00000198502, ENSG00000196126
- 0.0296079 2 38 REACT_19324 PD-1 signaling ENSG00000198502, ENSG00000196126
- 0.0404262 2 45 REACT_12623 Generation of second messenger molecules ENSG00000198502, ENSG00000196126
- 0.0404262 2 45 REACT_25343 Destruction of AUF1 and mRNA ENSG00000204261, ENSG00000204264
- 0.0420779 2 46 REACT_23837 Interleukin-3, 5 and GM-CSF signaling ENSG00000134460, ENSG00000109471
- 0.0436615 3 111 REACT_25078 Interferon gamma signaling ENSG00000198502, ENSG00000128604, ENSG00000196126
- 0.0454566 2 48 REACT_13491 26S proteosome degrades ODC holoenzyme complex ENSG00000204261, ENSG00000204264
- 0.0471827 2 49 REACT_1471 Ubiquitinated geminin is degraded by the proteasome ENSG00000204261, ENSG00000204264
- 0.0471827 2 49 REACT_480 Ubiquitinated Orc1 is degraded by the proteasome ENSG00000204261, ENSG00000204264
- 0.0471827 2 49 REACT_1210 Ubiquitinated Cdc6 is degraded by the proteasome ENSG00000204261, ENSG00000204264
199

- 0.0471827 2 49 REACT_873 Proteolytic degradation of ubiquitinated-Cdc25A ENSG00000204261, ENSG00000204264


- 0.0471827 2 49 REACT_2142 Proteasome mediated degradation of Cyclin D1 ENSG00000204261, ENSG00000204264
- 0.0471827 2 49 REACT_13413 Proteasome mediated degradation of PAK-2p34 ENSG00000204261, ENSG00000204264
- 0.0471827 2 49 REACT_13464 Regulation of activated PAK-2p34 by proteasome mediated degradation ENSG00000204261, ENSG00000204264
- 0.0471827 2 49 REACT_13505 Proteasome mediated degradation of PAK-2p34 ENSG00000204261, ENSG00000204264
- 0.0471827 2 49 REACT_20637 Proteasome mediated degradation of COP1 ENSG00000204261, ENSG00000204264
- 0.0487093 3 116 REACT_20605 Metabolism of mRNA ENSG00000204261, ENSG00000082898, ENSG00000204264
- 0.0489329 2 50 REACT_1221 CDK-mediated phosphorylation and removal of Cdc6 ENSG00000204261, ENSG00000204264
- 0.0489329 2 50 REACT_4 Ubiquitin-dependent degradation of Cyclin D1 ENSG00000204261, ENSG00000204264
- 0.0489329 2 50 REACT_938 Ubiquitin-dependent degradation of Cyclin D ENSG00000204261, ENSG00000204264
- 0.0489329 2 50 REACT_13565 Regulation of ornithine decarboxylase (ODC) ENSG00000204261, ENSG00000204264

Supplementary Table 24 Full significant Reactome events using the confirmed gene list after exclusion of MHC region genes.
FDR Un-adjusted Query Total Identifier of Name of this Event Submitted identifiers mapping to this Event
probability Genes Genes this Event
- 0.0045986 2 17 REACT_23856 Phosphorylated SHC1 recruits GRB2:GAB2 ENSG00000134460, ENSG00000109471
- 0.0045986 2 17 REACT_23828 Phosphorylated SHC recruits GRB2:SOS1 ENSG00000134460, ENSG00000109471
- 0.0045986 2 17 REACT_23782 SHC1 mediates cytokine-induced phosphorylation of GAB2 ENSG00000134460, ENSG00000109471
- 0.0045986 2 17 REACT_23874 Phosphorylated SHC1 recruits SHIP ENSG00000134460, ENSG00000109471
- 0.0045986 2 17 REACT_23911 The SHC1:SHIP1 complex is stabilized by GRB2 ENSG00000134460, ENSG00000109471
- 0.005154 2 18 REACT_23928 SOS1 activates H-Ras ENSG00000134460, ENSG00000109471
- 0.0060282 3 62 REACT_15364 Loss of Nlp from mitotic centrosomes ENSG00000213066, ENSG00000119397, ENSG00000175203
- 0.0060282 3 62 REACT_15386 Plk1-mediated phosphorylation of Nlp ENSG00000213066, ENSG00000119397, ENSG00000175203
- 0.0060282 3 62 REACT_15451 Loss of proteins required for interphase microtubule organization from ENSG00000213066, ENSG00000119397, ENSG00000175203
the centrosome
- 0.0060282 3 62 REACT_15313 Loss of C-Nap-1 from centrosomes ENSG00000213066, ENSG00000119397, ENSG00000175203
- 0.0060282 3 62 REACT_15440 Dissociation of Phospho-Nlp from the centrosome ENSG00000213066, ENSG00000119397, ENSG00000175203
- 0.0060282 3 62 REACT_15470 Recruitment of Plk1 to centrosomes ENSG00000213066, ENSG00000119397, ENSG00000175203
- 0.0060282 3 62 REACT_15401 Recruitment of CDK11p58 to the centrosomes ENSG00000213066, ENSG00000119397, ENSG00000175203
- 0.0077952 3 68 REACT_15467 Recruitment of additional gamma tubulin/ gamma TuRC to the centrosome ENSG00000213066, ENSG00000119397, ENSG00000175203
- 0.0081164 3 69 REACT_19344 Costimulation by the CD28 family ENSG00000163600, ENSG00000178562, ENSG00000163599
- 0.0083637 2 23 REACT_24024 Gab2 binds the p85 subunit of Class 1A PI3 kinases ENSG00000134460, ENSG00000109471
- 0.0084453 3 70 REACT_15296 Recruitment of mitotic centrosome proteins and complexes ENSG00000213066, ENSG00000119397, ENSG00000175203
- 0.0084453 3 70 REACT_15479 Centrosome maturation ENSG00000213066, ENSG00000119397, ENSG00000175203
- 0.0099984 9 630 REACT_6900 Signaling in Immune system ENSG00000163600, ENSG00000128604, ENSG00000109471,
ENSG00000101017, ENSG00000106804, ENSG00000118503,
ENSG00000134460, ENSG00000163599, ENSG00000178562
- 0.0125872 3 81 REACT_2203 G2/M Transition ENSG00000213066, ENSG00000119397, ENSG00000175203
- 0.0131223 2 29 REACT_23891 Interleukin receptor SHC signaling ENSG00000134460, ENSG00000109471
200

- 0.0138868 3 84 REACT_21391 Mitotic G2-G2/M phases ENSG00000213066, ENSG00000119397, ENSG00000175203


- 0.0313896 2 46 REACT_23837 Interleukin-3, 5 and GM-CSF signaling ENSG00000134460, ENSG00000109471
- 0.0421246 2 54 REACT_15344 Chemokine receptors bind chemokines ENSG00000137077, ENSG00000112486
Supplementary Table 25 Full significant Reactome events using the expanded gene list.
FDR Un-adjusted Query Total Identifier of Name of this Event Submitted identifiers mapping to this Event
probability Genes Genes this Event
- 1.63E-07 26 630 REACT_6900 Signaling in Immune system ENSG00000177455, ENSG00000163600, ENSG00000090339, ENSG00000100385,
ENSG00000175104, ENSG00000196126, ENSG00000109471, ENSG00000106804,
ENSG00000065675, ENSG00000105371, ENSG00000163599, ENSG00000081237,
ENSG00000105397, ENSG00000198502, ENSG00000198821, ENSG00000175354,
ENSG00000128604, ENSG00000160710, ENSG00000076662, ENSG00000101017,
ENSG00000213658, ENSG00000179295, ENSG00000178562, ENSG00000118503,
ENSG00000134460, ENSG00000204264
- 4.72E-05 7 69 REACT_19344 Costimulation by the CD28 family ENSG00000198502, ENSG00000198821, ENSG00000163600, ENSG00000179295,
ENSG00000178562, ENSG00000163599, ENSG00000196126
- 5.40E-05 3 6 REACT_11173 ICAMs 1-4 bind to Integrin LFA-1 ENSG00000105371, ENSG00000090339, ENSG00000076662
- 8.84E-05 7 76 REACT_12526 TCR signaling ENSG00000081237, ENSG00000065675, ENSG00000198502, ENSG00000198821,
ENSG00000213658, ENSG00000175104, ENSG00000196126
- 0.0002177 9 148 REACT_25229 Interferon Signaling ENSG00000105397, ENSG00000198502, ENSG00000175354, ENSG00000090339,
ENSG00000160710, ENSG00000128604, ENSG00000196126, ENSG00000179295,
ENSG00000204264
- 0.0011376 4 33 REACT_12446 Dephosphorylation of Lck-pY505 by CD45 ENSG00000081237, ENSG00000198502, ENSG00000198821, ENSG00000196126
- 0.0011376 4 33 REACT_12566 Change of PKC theta conformation ENSG00000065675, ENSG00000198502, ENSG00000198821, ENSG00000196126
201

- 0.0011826 2 4 REACT_6855 p-nitrophenol + PAPS => p-nitrophenol sulfate + PAP ENSG00000197165, ENSG00000196502
- 0.0011826 2 4 REACT_25338 SH2B proteins bind JAK2 ENSG00000111252, ENSG00000178188
- 0.0012755 4 34 REACT_12421 Phosphorylation of TBSMs in LAT ENSG00000198502, ENSG00000198821, ENSG00000213658, ENSG00000196126
- 0.0014248 4 35 REACT_12582 Phosphorylation of CD3 and TCR zeta chains ENSG00000081237, ENSG00000198502, ENSG00000198821, ENSG00000196126
- 0.0015196 5 60 REACT_12555 Downstream TCR signaling ENSG00000065675, ENSG00000198502, ENSG00000198821, ENSG00000175104,
ENSG00000196126
- 0.0015858 4 36 REACT_12615 Phosphorylation of SLP-76 ENSG00000198502, ENSG00000198821, ENSG00000213658, ENSG00000196126
- 0.0016392 3 17 REACT_23856 Phosphorylated SHC1 recruits GRB2:GAB2 ENSG00000100385, ENSG00000134460, ENSG00000109471
- 0.0016392 3 17 REACT_23828 Phosphorylated SHC recruits GRB2:SOS1 ENSG00000100385, ENSG00000134460, ENSG00000109471
- 0.0016392 3 17 REACT_23782 SHC1 mediates cytokine-induced phosphorylation of GAB2 ENSG00000100385, ENSG00000134460, ENSG00000109471
- 0.0016392 3 17 REACT_23874 Phosphorylated SHC1 recruits SHIP ENSG00000100385, ENSG00000134460, ENSG00000109471
- 0.0016392 3 17 REACT_23911 The SHC1:SHIP1 complex is stabilized by GRB2 ENSG00000100385, ENSG00000134460, ENSG00000109471
- 0.0019451 4 38 REACT_12498 Phosphorylation of PLC-gamma1 ENSG00000198502, ENSG00000198821, ENSG00000213658, ENSG00000196126
- 0.0019451 4 38 REACT_19146 Dephosphorylation of CD3-zeta by PD-1 bound phosphatases ENSG00000198502, ENSG00000198821, ENSG00000179295, ENSG00000196126
- 0.0019451 4 38 REACT_19324 PD-1 signaling ENSG00000198502, ENSG00000198821, ENSG00000179295, ENSG00000196126
- 0.0019471 3 18 REACT_23928 SOS1 activates H-Ras ENSG00000100385, ENSG00000134460, ENSG00000109471
- 0.0019528 2 5 REACT_18416 S1P-binding receptors bind S1P ENSG00000180739, ENSG00000175898
- 0.003643 4 45 REACT_12623 Generation of second messenger molecules ENSG00000198502, ENSG00000198821, ENSG00000213658, ENSG00000196126
- 0.0039476 4 46 REACT_23837 Interleukin-3, 5 and GM-CSF signaling ENSG00000100385, ENSG00000179295, ENSG00000134460, ENSG00000109471
- 0.0040157 3 23 REACT_24024 Gab2 binds the p85 subunit of Class 1A PI3 kinases ENSG00000100385, ENSG00000134460, ENSG00000109471
- 0.0047301 6 111 REACT_25078 Interferon gamma signaling ENSG00000198502, ENSG00000175354, ENSG00000090339, ENSG00000179295,
ENSG00000128604, ENSG00000196126
- 0.0053182 2 8 REACT_18365 Lysosphingolipid and LPA receptors ENSG00000180739, ENSG00000175898
- 0.0067749 2 9 REACT_19203 SHP2 phosphatase binds CTLA-4 ENSG00000179295, ENSG00000163599
- 0.0076473 5 87 REACT_22232 Signaling by Interleukins ENSG00000100385, ENSG00000179295, ENSG00000175104, ENSG00000134460,
ENSG00000109471
- 0.0077952 3 29 REACT_23891 Interleukin receptor SHC signaling ENSG00000100385, ENSG00000134460, ENSG00000109471
- 0.0101615 2 11 REACT_14814 Formation of CSL-NICD coactivator complex ENSG00000204301, ENSG00000168214
- 0.0101615 2 11 REACT_14835 Notch-HLH transcription pathway ENSG00000204301, ENSG00000168214
- 0.0101615 2 11 REACT_19405 CTLA4 inhibitory signaling ENSG00000179295, ENSG00000163599
- 0.0102643 3 32 REACT_12633 Phosphorylation of ITAM motifs in CD3 complexes ENSG00000198502, ENSG00000198821, ENSG00000196126
- 0.0102643 3 32 REACT_12467 Activation of Lck ENSG00000198502, ENSG00000198821, ENSG00000196126
- 0.011177 3 33 REACT_12538 Phosphorylation of ZAP-70 by Lck ENSG00000198502, ENSG00000198821, ENSG00000196126
- 0.011177 3 33 REACT_12394 Activation of ZAP-70 ENSG00000198502, ENSG00000198821, ENSG00000196126
- 0.011177 3 33 REACT_12642 Recruitment of ZAP-70 to phosphorylated ITAMs ENSG00000198502, ENSG00000198821, ENSG00000196126
- 0.011177 3 33 REACT_12596 Translocation of ZAP-70 to Immunological synapse ENSG00000198502, ENSG00000198821, ENSG00000196126
- 0.0121352 3 34 REACT_12640 Inactivation of Lck by Csk ENSG00000198502, ENSG00000198821, ENSG00000196126
- 0.0141484 2 13 REACT_6913 Cytosolic sulfonation of small molecules ENSG00000197165, ENSG00000196502
- 0.0152161 5 103 REACT_25162 Interferon alpha/beta signaling ENSG00000105397, ENSG00000179295, ENSG00000160710, ENSG00000128604,
ENSG00000204264
- 0.0163556 2 14 REACT_24980 Regulation of IFNG signaling ENSG00000175354, ENSG00000179295
202

- 0.0214748 3 42 REACT_22237 Netrin-1 signaling ENSG00000065675, ENSG00000135439, ENSG00000179295


- 0.0265104 2 18 REACT_12621 Tie2 Signaling ENSG00000141738, ENSG00000179295
- 0.02936 2 19 REACT_15334 DARPP-32 events ENSG00000131771, ENSG00000065989
- 0.0323261 2 20 REACT_24939 Dephosphorylation of STAT1 by SHP2 ENSG00000105397, ENSG00000179295
- 0.041537 4 92 REACT_12051 Cell surface interactions at the vascular wall ENSG00000141738, ENSG00000100023, ENSG00000179295, ENSG00000116824
- 0.041537 4 92 REACT_25082 Expression of IFNG-stimulated genes ENSG00000198502, ENSG00000090339, ENSG00000128604, ENSG00000196126
- 0.0487735 2 25 REACT_25216 Regulation of IFNA signaling ENSG00000105397, ENSG00000179295
Supplementary Table 26 Full significant Reactome events using the expanded gene list after exclusion of MHC region genes.
FDR Un-adjusted Query Total Identifier of Name of this Event Submitted identifiers mapping to this Event
probability Genes Genes this Event
- 2.19E-06 23 630 REACT_6900 Signaling in Immune system ENSG00000177455, ENSG00000163600, ENSG00000090339,
ENSG00000100385, ENSG00000175104, ENSG00000109471,
ENSG00000106804, ENSG00000065675, ENSG00000105371,
ENSG00000163599, ENSG00000081237, ENSG00000105397,
ENSG00000198821, ENSG00000175354, ENSG00000128604,
ENSG00000160710, ENSG00000076662, ENSG00000101017,
ENSG00000213658, ENSG00000179295, ENSG00000178562,
ENSG00000118503, ENSG00000134460
- 4.29E-05 3 6 REACT_11173 ICAMs 1-4 bind to Integrin LFA-1 ENSG00000105371, ENSG00000090339, ENSG00000076662
- 0.0010153 2 4 REACT_6855 p-nitrophenol + PAPS => p-nitrophenol sulfate + PAP ENSG00000197165, ENSG00000196502
- 0.0010153 2 4 REACT_25338 SH2B proteins bind JAK2 ENSG00000111252, ENSG00000178188
- 0.0013133 3 17 REACT_23856 Phosphorylated SHC1 recruits GRB2:GAB2 ENSG00000100385, ENSG00000134460, ENSG00000109471
- 0.0013133 3 17 REACT_23828 Phosphorylated SHC recruits GRB2:SOS1 ENSG00000100385, ENSG00000134460, ENSG00000109471
- 0.0013133 3 17 REACT_23782 SHC1 mediates cytokine-induced phosphorylation of GAB2 ENSG00000100385, ENSG00000134460, ENSG00000109471
- 0.0013133 3 17 REACT_23874 Phosphorylated SHC1 recruits SHIP ENSG00000100385, ENSG00000134460, ENSG00000109471
- 0.0013133 3 17 REACT_23911 The SHC1:SHIP1 complex is stabilized by GRB2 ENSG00000100385, ENSG00000134460, ENSG00000109471
- 0.0015612 3 18 REACT_23928 SOS1 activates H-Ras ENSG00000100385, ENSG00000134460, ENSG00000109471
203

- 0.0016777 2 5 REACT_18416 S1P-binding receptors bind S1P ENSG00000180739, ENSG00000175898


- 0.0020196 5 69 REACT_19344 Costimulation by the CD28 family ENSG00000198821, ENSG00000163600, ENSG00000179295,
ENSG00000178562, ENSG00000163599
- 0.0029905 4 46 REACT_23837 Interleukin-3, 5 and GM-CSF signaling ENSG00000100385, ENSG00000179295, ENSG00000134460,
ENSG00000109471
- 0.0030932 5 76 REACT_12526 TCR signaling ENSG00000081237, ENSG00000065675, ENSG00000198821,
ENSG00000213658, ENSG00000175104
- 0.0032324 3 23 REACT_24024 Gab2 binds the p85 subunit of Class 1A PI3 kinases ENSG00000100385, ENSG00000134460, ENSG00000109471
- 0.0045787 2 8 REACT_18365 Lysosphingolipid and LPA receptors ENSG00000180739, ENSG00000175898
- 0.0055362 5 87 REACT_22232 Signaling by Interleukins ENSG00000100385, ENSG00000179295, ENSG00000175104,
ENSG00000134460, ENSG00000109471
- 0.0058369 2 9 REACT_19203 SHP2 phosphatase binds CTLA-4 ENSG00000179295, ENSG00000163599
- 0.0063038 3 29 REACT_23891 Interleukin receptor SHC signaling ENSG00000100385, ENSG00000134460, ENSG00000109471
- 0.0087669 2 11 REACT_19405 CTLA4 inhibitory signaling ENSG00000179295, ENSG00000163599
- 0.0122234 2 13 REACT_6913 Cytosolic sulfonation of small molecules ENSG00000197165, ENSG00000196502
- 0.0128512 6 148 REACT_25229 Interferon Signaling ENSG00000105397, ENSG00000175354, ENSG00000090339,
ENSG00000179295, ENSG00000160710, ENSG00000128604
- 0.0141401 2 14 REACT_24980 Regulation of IFNG signaling ENSG00000175354, ENSG00000179295
- 0.0175396 3 42 REACT_22237 Netrin-1 signaling ENSG00000065675, ENSG00000135439, ENSG00000179295
- 0.0229823 2 18 REACT_12621 Tie2 Signaling ENSG00000141738, ENSG00000179295
- 0.0254701 2 19 REACT_15334 DARPP-32 events ENSG00000131771, ENSG00000065989
- 0.0280623 2 20 REACT_24939 Dephosphorylation of STAT1 by SHP2 ENSG00000105397, ENSG00000179295
- 0.0326519 4 92 REACT_12051 Cell surface interactions at the vascular wall ENSG00000141738, ENSG00000100023, ENSG00000179295,
ENSG00000116824
- 0.0424835 2 25 REACT_25216 Regulation of IFNA signaling ENSG00000105397, ENSG00000179295
- 0.0442521 3 60 REACT_12555 Downstream TCR signaling ENSG00000065675, ENSG00000198821, ENSG00000175104
- 0.0464453 4 103 REACT_25162 Interferon alpha/beta signaling ENSG00000105397, ENSG00000179295, ENSG00000160710,
ENSG00000128604
- 0.0471597 5 150 REACT_15538 Liganded Gi-activating GPCR acts as a GEF for Gi ENSG00000106804, ENSG00000180739, ENSG00000137077,
ENSG00000175898, ENSG00000112486
- 0.0471597 5 150 REACT_22239 The Ligand:GPCR:Gi complex dissociates ENSG00000106804, ENSG00000180739, ENSG00000137077,
ENSG00000175898, ENSG00000112486
- 0.0471597 5 150 REACT_22289 Liganded Gi-activating GPCRs bind inactive heterotrimeric G-protein Gi ENSG00000106804, ENSG00000180739, ENSG00000137077,
ENSG00000175898, ENSG00000112486
- 0.0480147 3 62 REACT_15364 Loss of Nlp from mitotic centrosomes ENSG00000213066, ENSG00000119397, ENSG00000175203
- 0.0480147 3 62 REACT_15386 Plk1-mediated phosphorylation of Nlp ENSG00000213066, ENSG00000119397, ENSG00000175203
- 0.0480147 3 62 REACT_15451 Loss of proteins required for interphase microtubule ENSG00000213066, ENSG00000119397, ENSG00000175203
organization from the centrosome
- 0.0480147 3 62 REACT_15313 Loss of C-Nap-1 from centrosomes ENSG00000213066, ENSG00000119397, ENSG00000175203
- 0.0480147 3 62 REACT_15440 Dissociation of Phospho-Nlp from the centrosome ENSG00000213066, ENSG00000119397, ENSG00000175203
- 0.0480147 3 62 REACT_15470 Recruitment of Plk1 to centrosomes ENSG00000213066, ENSG00000119397, ENSG00000175203
204

- 0.0480147 3 62 REACT_15401 Recruitment of CDK11p58 to the centrosomes ENSG00000213066, ENSG00000119397, ENSG00000175203


- 0.0488791 2 27 REACT_19259 Sema4D in semaphorin signaling ENSG00000081237, ENSG00000141736
Supplementary Table 27 Full overlap of Reactome events.
Event ID Name of this Event Confirmed Confirmed Expanded Expanded
no HLA no HLA
REACT_19344 Costimulation by the CD28 family 0.000110599 0.008116374 4.72E-05 0.002019622
REACT_23782 SHC1 mediates cytokine-induced 0.006288019 0.004598618 0.001639207 0.00131331
phosphorylation of GAB2
REACT_23828 Phosphorylated SHC recruits 0.006288019 0.004598618 0.001639207 0.00131331
GRB2:SOS1
REACT_23837 Interleukin-3, 5 and GM-CSF signaling 0.042077909 0.031389581 0.003947564 0.002990545
REACT_23856 Phosphorylated SHC1 recruits 0.006288019 0.004598618 0.001639207 0.00131331
GRB2:GAB2
REACT_23874 Phosphorylated SHC1 recruits SHIP 0.006288019 0.004598618 0.001639207 0.00131331
REACT_23891 Interleukin receptor SHC signaling 0.017795098 0.013122275 0.007795166 0.006303848
REACT_23911 The SHC1:SHIP1 complex is stabilized 0.006288019 0.004598618 0.001639207 0.00131331
by GRB2
REACT_23928 SOS1 activates H-Ras 0.007042525 0.005153984 0.001947058 0.001561177
REACT_24024 Gab2 binds the p85 subunit of Class 1A 0.01138886 0.008363662 0.004015724 0.003232428
PI3 kinases
REACT_6900 Signaling in Immune system 0.00087851 0.009998391 1.63E-07 2.19E-06
REACT_11173 ICAMs 1-4 bind to Integrin LFA-1 5.40E-05 4.29E-05
REACT_12051 Cell surface interactions at the vascular 0.041536984 0.032651916
wall
REACT_1210 Ubiquitinated Cdc6 is degraded by the 0.047182747
proteasome
REACT_1221 CDK-mediated phosphorylation and 0.048932916
removal of Cdc6
REACT_12394 Activation of ZAP-70 0.022735336 0.011176961
REACT_12421 Phosphorylation of TBSMs in LAT 0.02404948 0.001275535
REACT_12446 Dephosphorylation of Lck-pY505 by 0.022735336 0.001137602
CD45
REACT_12467 Activation of Lck 0.021452336 0.010264292
REACT_12498 Phosphorylation of PLC-gamma1 0.029607937 0.001945071
REACT_12526 TCR signaling 8.84E-05 0.003093178
REACT_12538 Phosphorylation of ZAP-70 by Lck 0.022735336 0.011176961
REACT_12555 Downstream TCR signaling 0.001519642 0.044252071
REACT_12566 Change of PKC theta conformation 0.022735336 0.001137602
REACT_12582 Phosphorylation of CD3 and TCR zeta 0.025394287 0.001424769
chains
REACT_12596 Translocation of ZAP-70 to 0.022735336 0.011176961
Immunological synapse
REACT_12615 Phosphorylation of SLP-76 0.026769279 0.001585784
REACT_12621 Tie2 Signaling 0.02651042 0.022982326
REACT_12623 Generation of second messenger 0.040426241 0.003642989
molecules
REACT_12633 Phosphorylation of ITAM motifs in CD3 0.021452336 0.010264292
complexes
REACT_12640 Inactivation of Lck by Csk 0.02404948 0.012135163
REACT_12642 Recruitment of ZAP-70 to 0.022735336 0.011176961
phosphorylated ITAMs
REACT_13413 Proteasome mediated degradation of 0.047182747
PAK-2p34
REACT_13464 Regulation of activated PAK-2p34 by 0.047182747
proteasome mediated degradation
REACT_13491 26S proteosome degrades ODC 0.045456581
holoenzyme complex
REACT_13505 Proteasome mediated degradation of 0.047182747
PAK-2p34
REACT_13565 Regulation of ornithine decarboxylase 0.048932916
(ODC)
REACT_1471 Ubiquitinated geminin is degraded by 0.047182747
the proteasome
REACT_14814 Formation of CSL-NICD coactivator 0.002612054 0.01016153
complex
REACT_14835 Notch-HLH transcription pathway 0.002612054 0.01016153
REACT_152 Cell Cycle, Mitotic 0.018113994
REACT_15296 Recruitment of mitotic centrosome 0.013127188 0.008445338
proteins and complexes
REACT_15313 Loss of C-Nap-1 from centrosomes 0.009427742 0.006028182 0.048014749
REACT_15334 DARPP-32 events 0.029360041 0.025470071

205
REACT_15344 Chemokine receptors bind chemokines 0.042124557
REACT_15364 Loss of Nlp from mitotic centrosomes 0.009427742 0.006028182 0.048014749
REACT_15386 Plk1-mediated phosphorylation of Nlp 0.009427742 0.006028182 0.048014749
REACT_15401 Recruitment of CDK11p58 to the 0.009427742 0.006028182 0.048014749
centrosomes
REACT_15440 Dissociation of Phospho-Nlp from the 0.009427742 0.006028182 0.048014749
centrosome
REACT_15451 Loss of proteins required for interphase 0.009427742 0.006028182 0.048014749
microtubule organization from the
centrosome
REACT_15467 Recruitment of additional gamma 0.012135163 0.007795166
tubulin/ gamma TuRC to the
centrosome
REACT_15470 Recruitment of Plk1 to centrosomes 0.009427742 0.006028182 0.048014749
REACT_15479 Centrosome maturation 0.013127188 0.008445338
REACT_15538 Liganded Gi-activating GPCR acts as a 0.047159704
GEF for Gi
REACT_18365 Lysosphingolipid and LPA receptors 0.00531817 0.004578691
REACT_18416 Dephosphorylation of CD3-zeta by PD-1 0.001952762 0.001677707
bound phosphatases
REACT_19146 Dephosphorylation of CD3-zeta by PD-1 0.029607937 0.001945071
bound phosphatases
REACT_19203 SHP2 phosphatase binds CTLA-4 0.006774854 0.005836895
REACT_19259 Sema4D in semaphorin signaling 0.048879096
REACT_19324 PD-1 signaling 0.029607937 0.001945071
REACT_19405 CTLA4 inhibitory signaling 0.01016153 0.008766873
REACT_20605 Metabolism of mRNA 0.048709346
REACT_20637 Proteasome mediated degradation of 0.047182747
COP1
REACT_21391 Mitotic G2-G2/M phases 0.021355946 0.013886817
REACT_2142 Proteasome mediated degradation of 0.047182747
Cyclin D1
REACT_2203 G2/M Transition 0.019401516 0.012587187
REACT_22232 Signaling by Interleukins 0.007647252 0.005536234
REACT_22237 Netrin-1 signaling 0.021474792 0.017539603
REACT_22239 The Ligand:GPCR:Gi complex 0.047159704
dissociates
REACT_22289 Liganded Gi-activating GPCRs bind 0.047159704
inactive heterotrimeric G-protein Gi
REACT_24939 Dephosphorylation of STAT1 by SHP2 0.032326135 0.028062265
REACT_24980 Regulation of IFNG signaling 0.016355645 0.014140142
REACT_24994 Regulation of mRNA Stability by 0.022717251
Proteins that Bind AU-rich Elements
REACT_25078 Interferon gamma signaling 0.043661539 0.004730097
REACT_25082 Expression of IFNG-stimulated genes 0.027081716 0.041536984
REACT_25162 Interferon alpha/beta signaling 0.015216104 0.046445338
REACT_25216 Regulation of IFNA signaling 0.048773528 0.042483469
REACT_25229 Interferon Signaling 0.020107808 0.000217711 0.012851219
REACT_25338 SH2B proteins bind JAK2 0.001182562 0.001015278
REACT_25343 Destruction of AUF1 and mRNA 0.040426241
REACT_4 Ubiquitin-dependent degradation of 0.048932916
Cyclin D1
REACT_480 Ubiquitinated Orc1 is degraded by the 0.047182747
proteasome
REACT_6855 p-nitrophenol + PAPS => p-nitrophenol 0.001182562 0.001015278
sulfate + PAP
REACT_6913 Cytosolic sulfonation of small 0.01414838 0.012223437
molecules
REACT_873 Proteolytic degradation of 0.047182747
ubiquitinated-Cdc25A
REACT_938 Ubiquitin-dependent degradation of 0.048932916
Cyclin D

206
7.1.2 Protein-protein Interaction
Only 4 interactions were seen involving 7 genes after genes from the MHC region were
removed from the confirmed gene list, as the RBPJ interaction involved the MHC region
gene NOTCH4 (Supplementary Figure 8). As expected, a similar effect is seen using the
expanded gene list excluding MHC region genes, reducing the number of interactions from
36 to 27 and the number of genes from 43 to 31 (Supplementary Figure 9).

Supplementary Figure 8 Protein-protein interaction network


produced from the confirmed list after exclusion of MHC region
genes.

Supplementary Figure 9 Protein-protein interaction network produced from the


expanded list after exclusion of MHC region genes. Genes from the confirmed gene list
are shown as squares and hexagons symbolise genes from the expanded gene list.

207
Extended protein-protein interaction networks are shown in Supplementary Figure 10,
Supplementary Figure 11, Supplementary Figure 12 and Supplementary Figure 13. Analysis
of the extended interaction networks after exclusion of MHC region genes, results in a
reduced network of 80 interactions involving 57 genes using 29 interacting genes not in the
list (Supplementary Figure 11) for the confirmed gene list. Interactions between non MHC
region genes were unaffected; however, loss of the NOTCH4 MHC gene caused the FLNB-
SMURF2-TNPO3/RAB14 interaction to separate from the main network. The network
produced using the expanded gene list was much larger and complex (Supplementary
Figure 13). The number of interactions compared to the expanded gene list including the
MHC region genes dropped from 599 to 553 involving 273 genes, 179 of which were not
present in the gene list. Apart from the 13 MHC region genes and one gene from the
confirmed gene list, all other list genes remained the same as before. One of the three
separate interactions from the expanded gene list was missing due to the involvement of
the MHC region gene PSMB9. Full hub analysis results are shown in Supplementary Table
28, Supplementary Table 29, Supplementary Table 30 and Supplementary Table 31.

208
Supplementary Figure 10 Extended protein-protein interaction network produced from the confirmed gene
list. Genes from the confirmed gene list are shown as squares, genes in the MHC region are shown as
diamonds and hexagons symbolise genes from the expanded gene list.

209
Supplementary Figure 11 Extended protein-protein interaction network produced from the confirmed gene
list after exclusion of genes in the MHC region. Genes from the confirmed gene list are shown as squares and
hexagons symbolise genes from the expanded gene list.

210
Supplementary Figure 12 Extended protein-protein interaction network produced from the expanded gene
list. Genes from the confirmed gene list are shown as squares, genes in the MHC region are shown as
diamonds and hexagons symbolise genes from the expanded gene list.

211
Supplementary Figure 13 Extended protein-protein interaction network produced from the expanded gene
list after exclusion of genes in the MHC region. Genes from the confirmed gene list are shown as squares and
hexagons symbolise genes from the expanded gene list.

212
Supplementary Table 28 Full hub analysis of the extended protein-protein interaction network produced
from the confirmed gene list. Interactions are also split by the type of interaction (e.g. direct interaction with
confirmed gene).
Gene Gene Type Interactions Direct Direct Direct Interaction
Interaction With Interaction Interaction With Sub-
Confirmed Gene With Expanded With HLA Gene Interaction
Gene Gene
CD28 Confirmed 10 0 2 0 8
TRAF1 Confirmed 7 2 0 0 5
CD40 Confirmed 7 1 1 0 5
CD2 Confirmed 7 0 2 0 5
REL Confirmed 6 0 0 0 6
CTLA4 Confirmed 6 0 0 0 6
IL2RA Confirmed 5 1 0 0 4
PTPN22 Confirmed 5 0 1 0 4
TNFAIP3 Confirmed 4 1 1 0 2
RBPJ Confirmed 3 0 0 1 2
DDIT3 Confirmed 3 0 0 0 3
FAM107A Confirmed 3 0 0 0 3
GSN Confirmed 3 0 0 0 3
STAT4 Confirmed 3 0 0 0 3
FLNB Confirmed 3 0 0 0 3
IRF5 Confirmed 2 1 1 0 0
IL2 Confirmed 2 1 0 0 1
AGAP2 Confirmed 2 0 0 0 2
XPO1 Confirmed 1 1 0 0 0
BLK Confirmed 1 0 0 0 1
GLI1 Confirmed 1 0 0 0 1
KIF5A Confirmed 1 0 0 0 1
CEP110 Confirmed 1 0 0 0 1
OS9 Confirmed 1 0 0 0 1
RAB14 Confirmed 1 0 0 0 1
MARS Confirmed 1 0 0 0 1
TNPO3 Confirmed 1 0 0 0 1
CD247 Expanded 3 3 0 0 0
TRAF6 Expanded 3 3 0 0 0
PTPRC Expanded 2 2 0 0 0
NOTCH4 HLA 4 1 0 0 3
HLA-DRA HLA 4 0 0 3 1
HLA-DMB HLA 4 0 0 2 2
TAP1 HLA 3 0 0 2 1
HLA-DQB1 HLA 3 0 0 1 2
HLA-DRB1 HLA 3 0 0 1 2
PSMB8 HLA 2 0 0 2 0
TAP2 HLA 2 0 0 2 0
HLA-DQA2 HLA 2 0 0 1 1
HLA-DRB5 HLA 2 0 0 1 1
HLA-DOB HLA 1 0 0 1 0
HLA-DQA1 HLA 1 0 0 0 1
PSMB9 HLA 1 0 0 0 1
CD4 Sub-interaction 5 3 0 2 0
TRAF2 Sub-interaction 4 4 0 0 0
PIK3R1 Sub-interaction 4 4 0 0 0
LCK Sub-interaction 4 4 0 0 0
CD74 Sub-interaction 4 0 0 4 0
MAPK14 Sub-interaction 3 3 0 0 0
CBL Sub-interaction 3 3 0 0 0
SMURF2 Sub-interaction 3 3 0 0 0
CD86 Sub-interaction 2 2 0 0 0
TNIP2 Sub-interaction 2 2 0 0 0
NOTCH1 Sub-interaction 2 2 0 0 0
CD53 Sub-interaction 2 2 0 0 0
SUFU Sub-interaction 2 2 0 0 0
STAT3 Sub-interaction 2 2 0 0 0
CD80 Sub-interaction 2 2 0 0 0
HSPA8 Sub-interaction 2 2 0 0 0
STAT5B Sub-interaction 2 2 0 0 0
CSK Sub-interaction 2 2 0 0 0
PIK3CG Sub-interaction 2 2 0 0 0
MAP3K8 Sub-interaction 2 2 0 0 0

213
PLCG1 Sub-interaction 2 2 0 0 0
SPOP Sub-interaction 2 2 0 0 0
NFKB1 Sub-interaction 2 2 0 0 0
TRADD Sub-interaction 2 2 0 0 0
TRIM37 Sub-interaction 2 2 0 0 0
SP1 Sub-interaction 2 2 0 0 0
FYN Sub-interaction 2 2 0 0 0
JUN Sub-interaction 2 2 0 0 0
CASP3 Sub-interaction 2 2 0 0 0
GRB2 Sub-interaction 2 2 0 0 0
NCOA2 Sub-interaction 2 1 0 1 0
KRTAP4-12 Sub-interaction 2 1 0 1 0
PSEN1 Sub-interaction 2 1 0 1 0
SMAD2 Sub-interaction 2 1 0 1 0
PSEN2 Sub-interaction 2 1 0 1 0
HLA-DMA Sub-interaction 2 0 0 2 0
CTAG1B Sub-interaction 2 0 0 2 0

Supplementary Table 29 Full hub analysis of the extended protein-protein interaction network produced
from the confirmed gene list after exclusion of MHC region genes. Interactions are also split by the type of
interaction (e.g. direct interaction with confirmed gene).
Gene Gene Type Interactions Direct Direct Direct Interaction With
Interaction With Interaction With Interaction Sub-Interaction
Confirmed Gene Expanded Gene With HLA Gene Gene
CD28 Confirmed 10 0 2 0 8
TRAF1 Confirmed 7 2 0 0 5
CD40 Confirmed 7 1 1 0 5
CD2 Confirmed 7 0 2 0 5
CTLA4 Confirmed 6 0 0 0 6
REL Confirmed 6 0 0 0 6
IL2RA Confirmed 5 1 0 0 4
PTPN22 Confirmed 5 0 1 0 4
TNFAIP3 Confirmed 4 1 1 0 2
DDIT3 Confirmed 3 0 0 0 3
GSN Confirmed 3 0 0 0 3
STAT4 Confirmed 3 0 0 0 3
IRF5 Confirmed 2 1 1 0 0
IL2 Confirmed 2 1 0 0 1
RBPJ Confirmed 2 0 0 0 2
AGAP2 Confirmed 2 0 0 0 2
FAM107A Confirmed 2 0 0 0 2
XPO1 Confirmed 1 1 0 0 0
CEP110 Confirmed 1 0 0 0 1
RAB14 Confirmed 1 0 0 0 1
MARS Confirmed 1 0 0 0 1
BLK Confirmed 1 0 0 0 1
GLI1 Confirmed 1 0 0 0 1
FLNB Confirmed 1 0 0 0 1
TNPO3 Confirmed 1 0 0 0 1
TRAF6 Expanded 3 3 0 0 0
CD247 Expanded 3 3 0 0 0
PTPRC Expanded 2 2 0 0 0
LCK Sub-interaction 4 4 0 0 0
TRAF2 Sub-interaction 4 4 0 0 0
PIK3R1 Sub-interaction 4 4 0 0 0
CBL Sub-interaction 3 3 0 0 0
CD4 Sub-interaction 3 3 0 0 0
SMURF2 Sub-interaction 3 3 0 0 0
MAPK14 Sub-interaction 3 3 0 0 0
PIK3CG Sub-interaction 2 2 0 0 0
MAP3K8 Sub-interaction 2 2 0 0 0
PLCG1 Sub-interaction 2 2 0 0 0
SPOP Sub-interaction 2 2 0 0 0
CD86 Sub-interaction 2 2 0 0 0
NFKB1 Sub-interaction 2 2 0 0 0
TRADD Sub-interaction 2 2 0 0 0
TRIM37 Sub-interaction 2 2 0 0 0
TNIP2 Sub-interaction 2 2 0 0 0
NOTCH1 Sub-interaction 2 2 0 0 0
SP1 Sub-interaction 2 2 0 0 0

214
FYN Sub-interaction 2 2 0 0 0
CD53 Sub-interaction 2 2 0 0 0
SUFU Sub-interaction 2 2 0 0 0
STAT3 Sub-interaction 2 2 0 0 0
JUN Sub-interaction 2 2 0 0 0
CD80 Sub-interaction 2 2 0 0 0
CASP3 Sub-interaction 2 2 0 0 0
HSPA8 Sub-interaction 2 2 0 0 0
STAT5B Sub-interaction 2 2 0 0 0
CSK Sub-interaction 2 2 0 0 0
GRB2 Sub-interaction 2 2 0 0 0

Supplementary Table 30 Full hub analysis of the extended protein-protein interaction network produced
from the expanded gene list. Interactions are also split by the type of interaction (e.g. direct interaction with
confirmed gene).
Gene Gene Type Interactions Direct Direct Direct Interaction
Interaction Interaction Interaction With Sub-
With Confirmed With Expanded With HLA Gene Interaction
Gene Gene Gene
TRAF1 Confirmed 18 2 0 0 16
CD28 Confirmed 13 0 2 0 11
CD40 Confirmed 12 1 1 0 10
REL Confirmed 12 0 0 0 12
CTLA4 Confirmed 10 0 1 0 9
TNFAIP3 Confirmed 9 1 1 0 7
GSN Confirmed 9 0 0 0 9
BLK Confirmed 8 0 0 0 8
IL2RA Confirmed 7 1 2 0 4
XPO1 Confirmed 7 1 1 0 5
CD2 Confirmed 7 0 2 0 5
PTPN22 Confirmed 7 0 1 0 6
IL2 Confirmed 6 1 1 0 4
DDIT3 Confirmed 5 0 1 0 4
RBPJ Confirmed 5 0 0 1 4
FAM107A Confirmed 5 0 0 0 5
FLNB Confirmed 5 0 0 0 5
IRF5 Confirmed 4 1 1 0 2
MAGI3 Confirmed 4 0 0 0 4
STAT4 Confirmed 4 0 0 0 4
HIPK1 Confirmed 3 0 0 0 3
OS9 Confirmed 2 0 0 0 2
AGAP2 Confirmed 2 0 0 0 2
CCL21 Confirmed 1 0 0 0 1
PAM Confirmed 1 0 0 0 1
AHSA2 Confirmed 1 0 0 0 1
ARHGAP9 Confirmed 1 0 0 0 1
GLI1 Confirmed 1 0 0 0 1
KIF5A Confirmed 1 0 0 0 1
MARS Confirmed 1 0 0 0 1
TNPO3 Confirmed 1 0 0 0 1
SPRED2 Confirmed 1 0 0 0 1
DCTN2 Confirmed 1 0 0 0 1
CEP110 Confirmed 1 0 0 0 1
RAB14 Confirmed 1 0 0 0 1
RSBN1 Confirmed 1 0 0 0 1
PIP4K2C Confirmed 1 0 0 0 1
RPP14 Confirmed 1 0 0 0 1
PTPN11 Expanded 50 1 1 0 48
TRAF6 Expanded 33 3 1 0 29
PTPRC Expanded 28 2 2 0 24
CD247 Expanded 22 3 2 0 17
ERBB2 Expanded 22 0 2 0 20
IL2RB Expanded 21 2 0 0 19
TYK2 Expanded 19 0 1 0 18
LAT Expanded 19 0 0 0 19
CD19 Expanded 16 0 0 0 16
PRKCQ Expanded 14 0 1 0 13
CDC37 Expanded 14 0 1 0 13
PTPN2 Expanded 14 0 0 0 14
GRB7 Expanded 13 0 1 0 12

215
POU2F1 Expanded 13 0 0 0 13
ICAM1 Expanded 7 1 0 0 6
MAPKAPK5 Expanded 7 0 0 0 7
CDK6 Expanded 6 0 1 0 5
SH2B3 Expanded 6 0 1 0 5
ICAM3 Expanded 5 0 1 0 4
PPP1R1B Expanded 5 0 0 0 5
SH2B1 Expanded 5 0 0 0 5
ATXN2 Expanded 5 0 0 0 5
UBE2L3 Expanded 5 0 0 0 5
ICAM5 Expanded 4 0 0 0 4
RAG1 Expanded 3 0 1 0 2
CCDC116 Expanded 3 0 0 0 3
ICAM4 Expanded 3 0 0 0 3
ALDH2 Expanded 3 0 0 0 3
RPL6 Expanded 3 0 0 0 3
ATXN2L Expanded 3 0 0 0 3
RAVER1 Expanded 3 0 0 0 3
IL6R Expanded 3 0 0 0 3
BATF Expanded 2 1 0 0 1
ATP2A1 Expanded 2 0 0 0 2
BRAP Expanded 2 0 0 0 2
S1PR5 Expanded 2 0 0 0 2
TMPRSS3 Expanded 2 0 0 0 2
PLCL2 Expanded 2 0 0 0 2
IKZF3 Expanded 2 0 0 0 2
SPNS1 Expanded 2 0 0 0 2
EIF3C Expanded 2 0 0 0 2
ADAR Expanded 1 1 0 0 0
RAG2 Expanded 1 0 1 0 0
TRAFD1 Expanded 1 0 1 0 0
MRPL4 Expanded 1 0 0 0 1
SHE Expanded 1 0 0 0 1
ERP29 Expanded 1 0 0 0 1
S1PR2 Expanded 1 0 0 0 1
CCDC101 Expanded 1 0 0 0 1
PPIL2 Expanded 1 0 0 0 1
TCAP Expanded 1 0 0 0 1
CUX2 Expanded 1 0 0 0 1
UBE2Q1 Expanded 1 0 0 0 1
ORMDL3 Expanded 1 0 0 0 1
TMPRSS6 Expanded 1 0 0 0 1
PDE4A Expanded 1 0 0 0 1
UBASH3A Expanded 1 0 0 0 1
HLA-DRA HLA 6 0 0 3 3
TAP1 HLA 6 0 0 2 4
NOTCH4 HLA 5 1 0 0 4
HLA-DMB HLA 5 0 0 2 3
HLA-DRB1 HLA 5 0 0 1 4
TAP2 HLA 3 0 0 2 1
PSMB8 HLA 3 0 0 2 1
HLA-DQB1 HLA 3 0 0 1 2
HLA-DQA1 HLA 3 0 0 0 3
HLA-DRB5 HLA 2 0 0 1 1
HLA-DQA2 HLA 2 0 0 1 1
HLA-DOB HLA 1 0 0 1 0
PSMB9 HLA 1 0 0 0 1
GRB2 Sub-interaction 12 2 10 0 0
LCK Sub-interaction 11 4 7 0 0
PIK3R1 Sub-interaction 9 4 5 0 0
CBL Sub-interaction 9 3 6 0 0
FYN Sub-interaction 9 2 7 0 0
CD4 Sub-interaction 7 3 2 2 0
STAT5B Sub-interaction 7 2 5 0 0
SHC1 Sub-interaction 7 1 6 0 0
LYN Sub-interaction 7 1 6 0 0
STAT3 Sub-interaction 6 2 4 0 0
STAT5A Sub-interaction 6 1 5 0 0
SRC Sub-interaction 6 1 5 0 0
PTPN6 Sub-interaction 6 0 6 0 0
TRAF2 Sub-interaction 5 4 1 0 0

216
MAPK14 Sub-interaction 5 3 2 0 0
PLCG1 Sub-interaction 5 2 3 0 0
ZAP70 Sub-interaction 5 1 4 0 0
JAK3 Sub-interaction 5 1 4 0 0
JAK2 Sub-interaction 5 1 4 0 0
MAP3K14 Sub-interaction 5 1 4 0 0
CAV1 Sub-interaction 5 1 4 0 0
CSNK2A1 Sub-interaction 5 1 4 0 0
STAT1 Sub-interaction 5 1 4 0 0
VAV1 Sub-interaction 5 1 4 0 0
INSR Sub-interaction 5 0 5 0 0
JAK1 Sub-interaction 5 0 5 0 0
EGFR Sub-interaction 5 0 5 0 0
IRS1 Sub-interaction 5 0 5 0 0
JUN Sub-interaction 4 2 2 0 0
KIT Sub-interaction 4 1 3 0 0
CHUK Sub-interaction 4 1 3 0 0
AR Sub-interaction 4 1 3 0 0
HSP90AA1 Sub-interaction 4 1 3 0 0
KRTAP4-12 Sub-interaction 4 1 2 1 0
PDGFRB Sub-interaction 4 0 4 0 0
ATXN1 Sub-interaction 4 0 4 0 0
SYK Sub-interaction 4 0 4 0 0
ITGB2 Sub-interaction 4 0 4 0 0
IL6ST Sub-interaction 4 0 4 0 0
CD82 Sub-interaction 4 0 2 2 0
CD74 Sub-interaction 4 0 0 4 0
SMURF2 Sub-interaction 3 3 0 0 0
CSK Sub-interaction 3 2 1 0 0
MAP3K8 Sub-interaction 3 2 1 0 0
PIK3CG Sub-interaction 3 2 1 0 0
SP1 Sub-interaction 3 2 1 0 0
TRIM37 Sub-interaction 3 2 1 0 0
CASP3 Sub-interaction 3 2 1 0 0
EIF6 Sub-interaction 3 1 2 0 0
IKBKG Sub-interaction 3 1 2 0 0
PTK2B Sub-interaction 3 1 2 0 0
PLCG2 Sub-interaction 3 1 2 0 0
IKBKB Sub-interaction 3 1 2 0 0
IL2RG Sub-interaction 3 1 2 0 0
BTK Sub-interaction 3 1 2 0 0
PSEN1 Sub-interaction 3 1 1 1 0
PSEN2 Sub-interaction 3 1 1 1 0
SOS1 Sub-interaction 3 0 3 0 0
UBB Sub-interaction 3 0 3 0 0
CDK5 Sub-interaction 3 0 3 0 0
MSN Sub-interaction 3 0 3 0 0
BRCA1 Sub-interaction 3 0 3 0 0
NTRK2 Sub-interaction 3 0 3 0 0
IFNAR1 Sub-interaction 3 0 3 0 0
GHR Sub-interaction 3 0 3 0 0
IRS2 Sub-interaction 3 0 3 0 0
CSNK2A2 Sub-interaction 3 0 3 0 0
ABL1 Sub-interaction 3 0 3 0 0
EZR Sub-interaction 3 0 3 0 0
GAB2 Sub-interaction 3 0 3 0 0
ITGAL Sub-interaction 3 0 3 0 0
MBP Sub-interaction 3 0 2 1 0
UNC119 Sub-interaction 3 0 2 1 0
ESR1 Sub-interaction 3 0 2 1 0
PSMB5 Sub-interaction 3 0 1 2 0
SUFU Sub-interaction 2 2 0 0 0
CD80 Sub-interaction 2 2 0 0 0
SPOP Sub-interaction 2 2 0 0 0
NFKB1 Sub-interaction 2 2 0 0 0
CD86 Sub-interaction 2 2 0 0 0
NOTCH1 Sub-interaction 2 2 0 0 0
TNIP2 Sub-interaction 2 2 0 0 0
CD53 Sub-interaction 2 2 0 0 0
HSPA8 Sub-interaction 2 2 0 0 0
TRADD Sub-interaction 2 2 0 0 0

217
ACTN4 Sub-interaction 2 1 1 0 0
GRAP2 Sub-interaction 2 1 1 0 0
VCL Sub-interaction 2 1 1 0 0
IL4R Sub-interaction 2 1 1 0 0
TRAF3IP2 Sub-interaction 2 1 1 0 0
TP53 Sub-interaction 2 1 1 0 0
PTEN Sub-interaction 2 1 1 0 0
FCGR2A Sub-interaction 2 1 1 0 0
TICAM1 Sub-interaction 2 1 1 0 0
PXN Sub-interaction 2 1 1 0 0
NGFR Sub-interaction 2 1 1 0 0
NGFRAP1 Sub-interaction 2 1 1 0 0
YWHAG Sub-interaction 2 1 1 0 0
ITGB1 Sub-interaction 2 1 1 0 0
GRIN2B Sub-interaction 2 1 1 0 0
MAPK6 Sub-interaction 2 1 1 0 0
TBK1 Sub-interaction 2 1 1 0 0
SNIP1 Sub-interaction 2 1 1 0 0
TNFRSF1A Sub-interaction 2 1 1 0 0
PRKACA Sub-interaction 2 1 1 0 0
DSCAML1 Sub-interaction 2 1 1 0 0
XRCC5 Sub-interaction 2 1 1 0 0
NCOR2 Sub-interaction 2 1 1 0 0
CD79A Sub-interaction 2 1 1 0 0
BCR Sub-interaction 2 1 1 0 0
UBQLN4 Sub-interaction 2 1 1 0 0
ANP32A Sub-interaction 2 1 1 0 0
UBE3A Sub-interaction 2 1 1 0 0
CCDC85B Sub-interaction 2 1 1 0 0
CD79B Sub-interaction 2 1 1 0 0
TNFRSF19 Sub-interaction 2 1 1 0 0
RNF11 Sub-interaction 2 1 1 0 0
USP7 Sub-interaction 2 1 1 0 0
NFKB2 Sub-interaction 2 1 1 0 0
BCL2 Sub-interaction 2 1 1 0 0
ITK Sub-interaction 2 1 1 0 0
ATF2 Sub-interaction 2 1 1 0 0
XRCC6 Sub-interaction 2 1 1 0 0
TBP Sub-interaction 2 1 1 0 0
MAPK8 Sub-interaction 2 1 1 0 0
PIK3R2 Sub-interaction 2 1 1 0 0
NF2 Sub-interaction 2 1 1 0 0
SMAD9 Sub-interaction 2 1 1 0 0
CALM1 Sub-interaction 2 1 1 0 0
FLNA Sub-interaction 2 1 1 0 0
IKBKE Sub-interaction 2 1 1 0 0
RELA Sub-interaction 2 1 1 0 0
PRKCA Sub-interaction 2 1 1 0 0
GFI1B Sub-interaction 2 1 1 0 0
TGFA Sub-interaction 2 1 1 0 0
TNFRSF11A Sub-interaction 2 1 1 0 0
RUNX1 Sub-interaction 2 1 1 0 0
PPP4C Sub-interaction 2 1 1 0 0
RIPK2 Sub-interaction 2 1 1 0 0
KRT15 Sub-interaction 2 1 1 0 0
FN1 Sub-interaction 2 1 1 0 0
HMGB1 Sub-interaction 2 1 1 0 0
IGSF21 Sub-interaction 2 1 1 0 0
MAP2K6 Sub-interaction 2 1 1 0 0
GNAO1 Sub-interaction 2 1 1 0 0
SMAD2 Sub-interaction 2 1 0 1 0
NCOA2 Sub-interaction 2 1 0 1 0
ITGAM Sub-interaction 2 0 2 0 0
LCP2 Sub-interaction 2 0 2 0 0
STUB1 Sub-interaction 2 0 2 0 0
CXCR4 Sub-interaction 2 0 2 0 0
MPL Sub-interaction 2 0 2 0 0
NTRK1 Sub-interaction 2 0 2 0 0
PTPN1 Sub-interaction 2 0 2 0 0
SLA Sub-interaction 2 0 2 0 0
MAP3K3 Sub-interaction 2 0 2 0 0

218
CD244 Sub-interaction 2 0 2 0 0
TIRAP Sub-interaction 2 0 2 0 0
RXRA Sub-interaction 2 0 2 0 0
CD8A Sub-interaction 2 0 2 0 0
MAPK1 Sub-interaction 2 0 2 0 0
CD22 Sub-interaction 2 0 2 0 0
TEK Sub-interaction 2 0 2 0 0
CTNNB1 Sub-interaction 2 0 2 0 0
GNAI1 Sub-interaction 2 0 2 0 0
LIFR Sub-interaction 2 0 2 0 0
TGFBR1 Sub-interaction 2 0 2 0 0
PTK2 Sub-interaction 2 0 2 0 0
HRAS Sub-interaction 2 0 2 0 0
ACTN1 Sub-interaction 2 0 2 0 0
GNB2L1 Sub-interaction 2 0 2 0 0
BCL2L1 Sub-interaction 2 0 2 0 0
SOCS3 Sub-interaction 2 0 2 0 0
CRKL Sub-interaction 2 0 2 0 0
SHB Sub-interaction 2 0 2 0 0
CISH Sub-interaction 2 0 2 0 0
ACTN2 Sub-interaction 2 0 2 0 0
RET Sub-interaction 2 0 2 0 0
HMGB2 Sub-interaction 2 0 2 0 0
EEF1A1 Sub-interaction 2 0 2 0 0
EPOR Sub-interaction 2 0 2 0 0
CDK2 Sub-interaction 2 0 2 0 0
PPP1CA Sub-interaction 2 0 2 0 0
TRB@ Sub-interaction 2 0 1 1 0
LSM1 Sub-interaction 2 0 1 1 0
TRA@ Sub-interaction 2 0 1 1 0
SH3GL2 Sub-interaction 2 0 1 1 0
SMAD4 Sub-interaction 2 0 1 1 0
MDFI Sub-interaction 2 0 1 1 0
HLA-DMA Sub-interaction 2 0 0 2 0
CTAG1B Sub-interaction 2 0 0 2 0

Supplementary Table 31 Full hub analysis of the extended protein-protein interaction network produced
from the expanded gene list after exclusion of MHC region genes. Interactions are also split by the type of
interaction (e.g. direct interaction with confirmed gene).
Gene Gene Type Interactions Direct Direct Direct Interaction
Interaction Interaction Interaction With Sub-
With Confirmed With Expanded With HLA Gene Interaction
Gene Gene Gene
TRAF1 Confirmed 18 2 0 0 16
CD28 Confirmed 13 0 2 0 11
CD40 Confirmed 12 1 1 0 10
REL Confirmed 12 0 0 0 12
CTLA4 Confirmed 10 0 1 0 9
TNFAIP3 Confirmed 9 1 1 0 7
GSN Confirmed 9 0 0 0 9
BLK Confirmed 8 0 0 0 8
IL2RA Confirmed 7 1 2 0 4
XPO1 Confirmed 7 1 1 0 5
CD2 Confirmed 7 0 2 0 5
PTPN22 Confirmed 7 0 1 0 6
IL2 Confirmed 6 1 1 0 4
DDIT3 Confirmed 5 0 1 0 4
FAM107A Confirmed 5 0 0 0 5
FLNB Confirmed 5 0 0 0 5
IRF5 Confirmed 4 1 1 0 2
RBPJ Confirmed 4 0 0 0 4
MAGI3 Confirmed 4 0 0 0 4
STAT4 Confirmed 4 0 0 0 4
HIPK1 Confirmed 3 0 0 0 3
AGAP2 Confirmed 2 0 0 0 2
CCL21 Confirmed 1 0 0 0 1
PAM Confirmed 1 0 0 0 1
AHSA2 Confirmed 1 0 0 0 1
ARHGAP9 Confirmed 1 0 0 0 1
GLI1 Confirmed 1 0 0 0 1

219
MARS Confirmed 1 0 0 0 1
TNPO3 Confirmed 1 0 0 0 1
SPRED2 Confirmed 1 0 0 0 1
DCTN2 Confirmed 1 0 0 0 1
CEP110 Confirmed 1 0 0 0 1
OS9 Confirmed 1 0 0 0 1
RAB14 Confirmed 1 0 0 0 1
RSBN1 Confirmed 1 0 0 0 1
PIP4K2C Confirmed 1 0 0 0 1
RPP14 Confirmed 1 0 0 0 1
PTPN11 Expanded 50 1 1 0 48
TRAF6 Expanded 32 3 1 0 28
PTPRC Expanded 27 2 2 0 23
ERBB2 Expanded 22 0 2 0 20
IL2RB Expanded 21 2 0 0 19
CD247 Expanded 20 3 2 0 15
TYK2 Expanded 19 0 1 0 18
LAT Expanded 19 0 0 0 19
CD19 Expanded 16 0 0 0 16
PRKCQ Expanded 14 0 1 0 13
CDC37 Expanded 14 0 1 0 13
PTPN2 Expanded 14 0 0 0 14
GRB7 Expanded 13 0 1 0 12
POU2F1 Expanded 13 0 0 0 13
ICAM1 Expanded 7 1 0 0 6
MAPKAPK5 Expanded 7 0 0 0 7
CDK6 Expanded 6 0 1 0 5
SH2B3 Expanded 6 0 1 0 5
ICAM3 Expanded 5 0 1 0 4
PPP1R1B Expanded 5 0 0 0 5
SH2B1 Expanded 5 0 0 0 5
ATXN2 Expanded 4 0 0 0 4
UBE2L3 Expanded 4 0 0 0 4
ICAM5 Expanded 4 0 0 0 4
RAG1 Expanded 3 0 1 0 2
ICAM4 Expanded 3 0 0 0 3
ALDH2 Expanded 3 0 0 0 3
RPL6 Expanded 3 0 0 0 3
ATXN2L Expanded 3 0 0 0 3
RAVER1 Expanded 3 0 0 0 3
IL6R Expanded 3 0 0 0 3
BATF Expanded 2 1 0 0 1
ATP2A1 Expanded 2 0 0 0 2
CCDC116 Expanded 2 0 0 0 2
BRAP Expanded 2 0 0 0 2
S1PR5 Expanded 2 0 0 0 2
TMPRSS3 Expanded 2 0 0 0 2
PLCL2 Expanded 2 0 0 0 2
IKZF3 Expanded 2 0 0 0 2
SPNS1 Expanded 2 0 0 0 2
EIF3C Expanded 2 0 0 0 2
ADAR Expanded 1 1 0 0 0
RAG2 Expanded 1 0 1 0 0
TRAFD1 Expanded 1 0 1 0 0
MRPL4 Expanded 1 0 0 0 1
SHE Expanded 1 0 0 0 1
ERP29 Expanded 1 0 0 0 1
S1PR2 Expanded 1 0 0 0 1
CCDC101 Expanded 1 0 0 0 1
PPIL2 Expanded 1 0 0 0 1
TCAP Expanded 1 0 0 0 1
CUX2 Expanded 1 0 0 0 1
UBE2Q1 Expanded 1 0 0 0 1
ORMDL3 Expanded 1 0 0 0 1
TMPRSS6 Expanded 1 0 0 0 1
PDE4A Expanded 1 0 0 0 1
UBASH3A Expanded 1 0 0 0 1
GRB2 Sub-interaction 12 2 10 0 0
LCK Sub-interaction 11 4 7 0 0
PIK3R1 Sub-interaction 9 4 5 0 0
CBL Sub-interaction 9 3 6 0 0

220
FYN Sub-interaction 9 2 7 0 0
STAT5B Sub-interaction 7 2 5 0 0
SHC1 Sub-interaction 7 1 6 0 0
LYN Sub-interaction 7 1 6 0 0
STAT3 Sub-interaction 6 2 4 0 0
STAT5A Sub-interaction 6 1 5 0 0
SRC Sub-interaction 6 1 5 0 0
PTPN6 Sub-interaction 6 0 6 0 0
TRAF2 Sub-interaction 5 4 1 0 0
MAPK14 Sub-interaction 5 3 2 0 0
CD4 Sub-interaction 5 3 2 0 0
PLCG1 Sub-interaction 5 2 3 0 0
ZAP70 Sub-interaction 5 1 4 0 0
JAK3 Sub-interaction 5 1 4 0 0
JAK2 Sub-interaction 5 1 4 0 0
MAP3K14 Sub-interaction 5 1 4 0 0
CAV1 Sub-interaction 5 1 4 0 0
CSNK2A1 Sub-interaction 5 1 4 0 0
STAT1 Sub-interaction 5 1 4 0 0
VAV1 Sub-interaction 5 1 4 0 0
INSR Sub-interaction 5 0 5 0 0
JAK1 Sub-interaction 5 0 5 0 0
EGFR Sub-interaction 5 0 5 0 0
IRS1 Sub-interaction 5 0 5 0 0
JUN Sub-interaction 4 2 2 0 0
KIT Sub-interaction 4 1 3 0 0
CHUK Sub-interaction 4 1 3 0 0
AR Sub-interaction 4 1 3 0 0
HSP90AA1 Sub-interaction 4 1 3 0 0
PDGFRB Sub-interaction 4 0 4 0 0
ATXN1 Sub-interaction 4 0 4 0 0
SYK Sub-interaction 4 0 4 0 0
ITGB2 Sub-interaction 4 0 4 0 0
IL6ST Sub-interaction 4 0 4 0 0
SMURF2 Sub-interaction 3 3 0 0 0
CSK Sub-interaction 3 2 1 0 0
MAP3K8 Sub-interaction 3 2 1 0 0
PIK3CG Sub-interaction 3 2 1 0 0
SP1 Sub-interaction 3 2 1 0 0
TRIM37 Sub-interaction 3 2 1 0 0
CASP3 Sub-interaction 3 2 1 0 0
KRTAP4-12 Sub-interaction 3 1 2 0 0
EIF6 Sub-interaction 3 1 2 0 0
IKBKG Sub-interaction 3 1 2 0 0
PTK2B Sub-interaction 3 1 2 0 0
PLCG2 Sub-interaction 3 1 2 0 0
IKBKB Sub-interaction 3 1 2 0 0
IL2RG Sub-interaction 3 1 2 0 0
BTK Sub-interaction 3 1 2 0 0
SOS1 Sub-interaction 3 0 3 0 0
UBB Sub-interaction 3 0 3 0 0
CDK5 Sub-interaction 3 0 3 0 0
MSN Sub-interaction 3 0 3 0 0
BRCA1 Sub-interaction 3 0 3 0 0
NTRK2 Sub-interaction 3 0 3 0 0
IFNAR1 Sub-interaction 3 0 3 0 0
GHR Sub-interaction 3 0 3 0 0
IRS2 Sub-interaction 3 0 3 0 0
CSNK2A2 Sub-interaction 3 0 3 0 0
ABL1 Sub-interaction 3 0 3 0 0
EZR Sub-interaction 3 0 3 0 0
GAB2 Sub-interaction 3 0 3 0 0
ITGAL Sub-interaction 3 0 3 0 0
SUFU Sub-interaction 2 2 0 0 0
CD80 Sub-interaction 2 2 0 0 0
SPOP Sub-interaction 2 2 0 0 0
NFKB1 Sub-interaction 2 2 0 0 0
CD86 Sub-interaction 2 2 0 0 0
NOTCH1 Sub-interaction 2 2 0 0 0
TNIP2 Sub-interaction 2 2 0 0 0
CD53 Sub-interaction 2 2 0 0 0

221
HSPA8 Sub-interaction 2 2 0 0 0
TRADD Sub-interaction 2 2 0 0 0
ACTN4 Sub-interaction 2 1 1 0 0
GRAP2 Sub-interaction 2 1 1 0 0
VCL Sub-interaction 2 1 1 0 0
IL4R Sub-interaction 2 1 1 0 0
TRAF3IP2 Sub-interaction 2 1 1 0 0
TP53 Sub-interaction 2 1 1 0 0
PTEN Sub-interaction 2 1 1 0 0
FCGR2A Sub-interaction 2 1 1 0 0
TICAM1 Sub-interaction 2 1 1 0 0
PXN Sub-interaction 2 1 1 0 0
NGFR Sub-interaction 2 1 1 0 0
NGFRAP1 Sub-interaction 2 1 1 0 0
YWHAG Sub-interaction 2 1 1 0 0
ITGB1 Sub-interaction 2 1 1 0 0
GRIN2B Sub-interaction 2 1 1 0 0
MAPK6 Sub-interaction 2 1 1 0 0
TBK1 Sub-interaction 2 1 1 0 0
SNIP1 Sub-interaction 2 1 1 0 0
TNFRSF1A Sub-interaction 2 1 1 0 0
PRKACA Sub-interaction 2 1 1 0 0
DSCAML1 Sub-interaction 2 1 1 0 0
XRCC5 Sub-interaction 2 1 1 0 0
NCOR2 Sub-interaction 2 1 1 0 0
CD79A Sub-interaction 2 1 1 0 0
BCR Sub-interaction 2 1 1 0 0
UBQLN4 Sub-interaction 2 1 1 0 0
ANP32A Sub-interaction 2 1 1 0 0
UBE3A Sub-interaction 2 1 1 0 0
CCDC85B Sub-interaction 2 1 1 0 0
CD79B Sub-interaction 2 1 1 0 0
TNFRSF19 Sub-interaction 2 1 1 0 0
USP7 Sub-interaction 2 1 1 0 0
RNF11 Sub-interaction 2 1 1 0 0
BCL2 Sub-interaction 2 1 1 0 0
NFKB2 Sub-interaction 2 1 1 0 0
ITK Sub-interaction 2 1 1 0 0
ATF2 Sub-interaction 2 1 1 0 0
PSEN1 Sub-interaction 2 1 1 0 0
XRCC6 Sub-interaction 2 1 1 0 0
TBP Sub-interaction 2 1 1 0 0
MAPK8 Sub-interaction 2 1 1 0 0
PSEN2 Sub-interaction 2 1 1 0 0
PIK3R2 Sub-interaction 2 1 1 0 0
NF2 Sub-interaction 2 1 1 0 0
SMAD9 Sub-interaction 2 1 1 0 0
CALM1 Sub-interaction 2 1 1 0 0
FLNA Sub-interaction 2 1 1 0 0
IKBKE Sub-interaction 2 1 1 0 0
RELA Sub-interaction 2 1 1 0 0
PRKCA Sub-interaction 2 1 1 0 0
GFI1B Sub-interaction 2 1 1 0 0
TGFA Sub-interaction 2 1 1 0 0
TNFRSF11A Sub-interaction 2 1 1 0 0
RUNX1 Sub-interaction 2 1 1 0 0
PPP4C Sub-interaction 2 1 1 0 0
RIPK2 Sub-interaction 2 1 1 0 0
KRT15 Sub-interaction 2 1 1 0 0
FN1 Sub-interaction 2 1 1 0 0
HMGB1 Sub-interaction 2 1 1 0 0
IGSF21 Sub-interaction 2 1 1 0 0
MAP2K6 Sub-interaction 2 1 1 0 0
GNAO1 Sub-interaction 2 1 1 0 0
ITGAM Sub-interaction 2 0 2 0 0
LCP2 Sub-interaction 2 0 2 0 0
STUB1 Sub-interaction 2 0 2 0 0
CXCR4 Sub-interaction 2 0 2 0 0
MPL Sub-interaction 2 0 2 0 0
NTRK1 Sub-interaction 2 0 2 0 0
PTPN1 Sub-interaction 2 0 2 0 0

222
SLA Sub-interaction 2 0 2 0 0
MAP3K3 Sub-interaction 2 0 2 0 0
CD244 Sub-interaction 2 0 2 0 0
TIRAP Sub-interaction 2 0 2 0 0
MBP Sub-interaction 2 0 2 0 0
RXRA Sub-interaction 2 0 2 0 0
CD8A Sub-interaction 2 0 2 0 0
MAPK1 Sub-interaction 2 0 2 0 0
CD22 Sub-interaction 2 0 2 0 0
TEK Sub-interaction 2 0 2 0 0
UNC119 Sub-interaction 2 0 2 0 0
CTNNB1 Sub-interaction 2 0 2 0 0
GNAI1 Sub-interaction 2 0 2 0 0
LIFR Sub-interaction 2 0 2 0 0
TGFBR1 Sub-interaction 2 0 2 0 0
ESR1 Sub-interaction 2 0 2 0 0
PTK2 Sub-interaction 2 0 2 0 0
CD82 Sub-interaction 2 0 2 0 0
HRAS Sub-interaction 2 0 2 0 0
ACTN1 Sub-interaction 2 0 2 0 0
GNB2L1 Sub-interaction 2 0 2 0 0
BCL2L1 Sub-interaction 2 0 2 0 0
SOCS3 Sub-interaction 2 0 2 0 0
CRKL Sub-interaction 2 0 2 0 0
SHB Sub-interaction 2 0 2 0 0
CISH Sub-interaction 2 0 2 0 0
ACTN2 Sub-interaction 2 0 2 0 0
RET Sub-interaction 2 0 2 0 0
HMGB2 Sub-interaction 2 0 2 0 0
EEF1A1 Sub-interaction 2 0 2 0 0
EPOR Sub-interaction 2 0 2 0 0
CDK2 Sub-interaction 2 0 2 0 0
PPP1CA Sub-interaction 2 0 2 0 0

8 Supplementary CD
The supplementary CD can be found in the inside back cover and contains additional
electronic materials.

223

Das könnte Ihnen auch gefallen