GWAS Data Analysis of Type 1 Diabetes Using Pathway and Network Approaches

Investigation of GWAS data on Type 1
Diabetes Mellitus using pathway and

network based approaches
T.J.E. Hooimeijer
I6049728
Biomedical Sciences
Direction Biological Health Sciences
Bachelor thesis
Supervised by: Martina Summer-Kutmon & Elisa Cirillo
Internship period: 20-04-2015 – 03-07-2015
Maastricht University
Date: 02-07-2015
Contents
Abstract
1. Introduction
1.1 Type 1 Diabetes Mellitus .................................................................................................. 1
1.2 Disease causes ................................................................................................................... 2
1.3 Genome wide association studies ...................................................................................... 2
1.4 Research aim ..................................................................................................................... 3
2. Materials and methods

2.1 Data resources
2.1.1 Diabetes type 1 GWAS datasets.................................................................................. 4
2.1.2 WikiPathways.............................................................................................................. 4
2.1.3. DisGeNET-analysis .................................................................................................... 5
2.2 Tools
2.2.1 GWAS3D .................................................................................................................... 5
2.2.2 Ensembl BioMart ........................................................................................................ 6
2.2.3 Cytoscape .................................................................................................................... 6
2.2.3 PathVisio ..................................................................................................................... 7
2.2.5 GO-analysis ................................................................................................................. 8
2.3 Workflow .......................................................................................................................... 8
3. Results
3.1 GWAS3D-analysis ............................................................................................................ 9
3.2 GO-Analysis .................................................................................................................... 11
3.3 Network analysis ............................................................................................................. 11
3.4 PathVisio visualization of SNPs
3.4.1. Visualization of SNPs in PathVisio ......................................................................... 14
3.4.2. Pathways linked to T1DM........................................................................................ 18
4. Discussion
4.1 GWAS analysis ............................................................................................................... 17
4.2 GO-Analysis .................................................................................................................... 18
4.3 Network analysis ............................................................................................................. 18
4.4 PathVisio visualization of SNPs ..................................................................................... 18
4.5 Comparing results with other studies .............................................................................. 21
4.6 Shortcomings in this study .............................................................................................. 22
4.7 Conclusion ....................................................................................................................... 22
5. References.............................................................................................................................23
6. Appendix...............................................................................................................................26
Add-on page: Description of work by student Bachelor BMW ............................................... 34

Abstract
Type 1 Diabetes Mellitus (T1DM) is an autoimmune disorder in which the body attacks its
own beta-cells. This leads to an insufficient insulin production, causing a defective glucose
homeostasis. The exact mechanism causing T1DM is unknown, but environmental- as well as
genetic factors play a crucial role. With the upcoming of genome-wide association studies
(GWAS), more and more genes determining the genetic susceptibility to T1DM become
known. The MHC-genes are well known to be involved, but many other genes are already
discovered and probably more will be discovered by feature research. In this research project
we tried to find the disease related processes by integrating and visualizing the genetic
variants with existing pathway knowledge.
Data from two different GWAS were reanalyzed to examine the involvement of SNPs in the
susceptibility to T1DM. SNPs were visualized and reprioritized by GWAS3D and
consequently linked to their genes using Ensembl BioMart. A network was created were the
gene-SNP list was merged together with a list of all the pathway and genes from
WikiPathways. Not all genes could be linked to a pathway from WikiPathways and not all
pathways had genes with T1DM related SNPs. These genes and pathways were removed to
create a final network. This network was analyzed and a GO-analysis was performed to
search for enriched biological processes. The genes not linked to a pathway were further
investigated using DisGeNET. An effort was made to visualize SNPs using PathVisio.
The three pathways with the most genes with SNPs were the ‘Allograft Rejection’,
‘Proteasome degradation’ and ‘Type II interferon signaling’ pathways. The MHC/HLA-genes
were involved multiple times in these pathways, explaining their importance in T1DM. GO-
analysis also showed that immunological process represent most of the enriched biological
processes. The visualization of SNPs in PathVisio possibly improves the pathway analysis
from GWAS data by giving information about the SNPs after selection of a gene and
providing the possibilty to visualize SNPs based on SIFT-scores (from Ensembl BioMart).
The method applied is able to confirm existing knowledge and gives possibilities to search
new loci and pathways involved in the disease process.
1. Introduction
1.1 Type 1 Diabetes Mellitus
Type 1 Diabetes Mellitus (T1DM) is a chronic disease in which the beta-cells in the islands of
Langerhans are damaged or destroyed. T1DM belongs to the class of autoimmune disorders,
which are characterized by the breakdown of the host’s own tissue leading to damage in
different organs and tissues, in this case the beta-cells (1). In the Netherlands alone
approximately 750.000 people are diagnosed with diabetes, of which 125.000 people had
T1DM (2013) (2).
T1DM is also called ‘juvenile onset diabetes’ or ‘insulin-dependent diabetes mellitus’ because
the typical age of onset is between 6-25 years. T1DM is a disease in which the human body
does not produce (sufficient) insulin because of the damaged or missing beta-cells, causing
blood glucose concentrations (BGC) to exceed the accepted range. BGC can range between
300 mg/100 ml and 1200 mg/ml, which has multiple physiological effects. The excess
glucose will partly be excreted via the kidneys. The lack of insulin causes inefficient use of
the available carbohydrates. The excess excretion together with the increased osmotic
pressure in the blood stream causes dehydration of the cells and extracellular fluids. Long
periods of high BGC can even lead to endothelial dysfunction, leading to insufficient blood
supply to various tissue, which in turn increases the risk of a stroke, heart attack, blindness
and numerous other serious conditions. Patients are treated with insulin, helping to control
blood sugar levels and thereby counteracting side effects of hyperglycemia (3, 4).
The negative effects associated with long periods of high BGC indicate that more research
into the causes and possible alternative treatment options for T1DM patients is needed. Better
treatment options would be of benefit to numerous people (1, 3).
T1DM must not be confused with Type 2 Diabetes Mellitus (T2DM), see Figure 1. Both types
of diabetes mellitus are recognized by frequent urination, hunger and intense thirst when
untreated. Despite this similarity there are a lot of differences between T1DM and T2DM.
T2DM is also called ‘maturity onset diabetes’ and is caused by insulin resistance. T2DM
often develops at ages over 40 years and is strongly associated with obesity (3, 5).
1
Figure 1: The differences between T1DM and T2DM.
In both types of diabetes, cells do not absorb glucose. In T1DM patients there is not enough insulin produced
due to the lack of pancreatic beta-cells, which are destroyed by an autoimmune reaction. The insulin is needed to
absorb the glucose. In T2DM patients the cells do not respond to the produced insulin, since the cells have
become insulin resistant. Both diseases lead to a high blood sugar level also called hyperglycemia.
1.2 Disease causes

The autoimmune reaction in T1DM causes the breakdown of the beta-islets, but it may not be
the primary cause of the disease. The exact mechanisms causing T1DM are still unknown.
Immune disorders and viral infection can play a role, but also genetic predisposition and
environmental factors are important (6, 7). This has also been demonstrated by studying the
different rates of development of T1DM between monozygotic twins or the 15-fold higher
risk for developing the disease for people with a first-degree relative of T1DM patients (8).
Research into the causes of T1DM have shown the importance of the autoimmune reaction
against the pancreatic beta-cells, especially the CD4+- and CD8+ T-cells in the breakdown of
beta-cells in the pancreas. Genetic variations in genes involved around the T-cell signaling,
activation and proliferation of T-cells or in genes involved in the cytokine production may
cause malfunctioning proteins, favouring an autoimmune response (8).
Currently, a lot of research is conducted into the genes involved in all the processes around T-
cell signaling.
1.3 Genome wide association studies

In a genome wide association study (GWAS) the genome is screened for genetic variants in
multiple individuals. A common approach is to select subjects with and without a specific
trait or disease and then comparing the genomes of the two groups. The aim of genome-wide
association studies is to find genetic variants which are present in the disease population but
not in the (healthy) control group. With the increasing amount of GWAS data available, more
2
and more loci are found that may be involved in disease development. Several studies found
genes and loci related to T1DM, including genes involved in immune response, like the major
histocompatibility complex (MHC) genes, the protein tyrosine phosphatase-22 (PTPN22)
gene and the inhibitory T-cell signaling molecule producing cytotoxic T-lymphocyte-
associated protein 4 (CTLA4) gene (9). The identification of the genes and loci involved in
T1DM may help to better understand the disease mechanisms, and might be useful for early
diagnostics and development of better treatment options.
1.4 Research aim

In this project we combine data from GWAS data on T1DM (or in combination with other
diseases) to investigate the effects of single nucleotide polymorphisms (SNPs) on the genetic
predisposition to T1DM using pathway- and network based approaches. A SNP is a variation
in the DNA sequence found in at least 1% of the population (10). These SNPs will be
analyzed by linking them to genes and consequently combining the genes with pathway
knowledge to form a pathway-gene-SNP network.
Our research question is: “Can we find disease related processes by integrating and
visualizing the significant genetic variants with existing pathway knowledge?”
The visualization of the relationship between SNPs, genes and diseases in pathway diagrams
and networks is an important and intuitive method to confirm existing knowledge as well as
gain new insights into disease mechanisms. Since T1DM is an autoimmune disease, we
expect to find variations in genes that are involved in immune related pathways.
The GWAS data was obtained from an open access database provided by Johnson AD and
O'Donnell CJ. The SNPs were visualized and reprioritized using GWAS3D, a tool that adds
SNPs based on linkage disequilibrium. This means that SNPs can be added based on
associations between those alleles, different than what would be expected if the alleles were
independent (11). Significant SNPs were linked to their genes using Ensembl BioMart and
consequently these genes were linked to biological pathways in which they are involved. A
network of SNPs, genes and pathways was made using Cytoscape and relevant pathways were
visualized using PathVisio.
3
2. Materials and methods
Section 2.1 the different data resources that are used in the analysis are explained, in section
2.2. the tools and section 2.3. will give a detailed workflow of the analysis
2.1 Data resources

2.1.1 Diabetes type 1 GWAS datasets
Johnson and O'Donnell analysed and integrated the results of more than hundred GWA
studies (12). Two of those studies included data about T1DM. The first study was performed
by the WTCCC group, who conducted a GWAS based on 2.000 cases each for 7 diseases with
a combined control group of 3.000 people. The 7 diseases were bipolar disorder, coronary
artery disease, Crohn's disease, hypertension, rheumatoid arthritis and type I and II diabetes
mellitus. The participants in this study were residents of Great Britain and were classified as
white Europeans with caucasian ancestry. Of the 3.000 control individuals, 1.500 were
recruited from the 1958 British Birth Cohort and 1.500 from blood donors recruited by the
WTCCC. Their age varied between 18 and 69 years. Participants of the T1DM group were
required to be diagnosed with T1DM before the age of 17 and to be insulin dependent from
that time on with a minimum period of 6 months. After quality control, a total of 4.806
samples were analyzed from 1.868 cases and 2.938 controls. A total of 469.557 SNPs were
analyzed (13).
The second study was performed by Hakonarson et al. In this dataset a total of 1.028 cases
and 1.143 controls were used. Cases and family trios with T1DM were selected from multiple
paediatric diabetes clinics in the United States. Control individuals were selected from the
Children’s Hospital of Philadelphia Health Care Network. With the Illumina Human Hap550
Genotyping BeadChip 550.000 SNPs were genotyped from a total of 563 cases, 1.146
controls and 483 T1DM family trios. They all had European ancestry. After quality control a
total of 534.071 SNPs were analyzed (14).
2.1.2 WikiPathways
WikiPathways (http://www.wikipathways.org) is an open online database for biological
pathways that is used for sharing and publishing pathway knowledge (15). Hyperlinks to
literature references and/or information pages on a gene or protein can be added to the
pathway diagram to include as much information as possible. WikiPathways is not only a
convenient tool to study human biology, also pathway data for 20 other species are included.
Each pathway has its own wiki page where the pathway diagram, the description, the
4
references and a list of pathway elements, including gene products, metabolites and proteins,
are available. Pathways can be edited by using the pathway editor. A version history for each
change in the pathway is also kept (15). Pathways can also be downloaded and used in
advanced analysis programs like PathVisio. PathVisio is described in section 2.2.3.
A list of all the pathways and linked genes from WikiPathways was provided by Bram van
Steen.
2.1.3. DisGeNET-analysis
DisGeNET (http://disgenet.org/web/DisGeNET/menu) is an open access databases that
combines different sources to link genes to diseases (16). DisGeNET was used to check for
gene-disease associations of the genes in our study that were not present in any of the
pathways.
A script was made to convert the Entrez gene ids -associated to diseases, to the Ensembl gene
ids - disease associations. In Cytoscape a network was made in which genes were linked to
the disease(s) to which they are associated. This network was analyzed manually.
2.2 Tools
2.2.1 GWAS3D
GWAS3D (http://jjwanglab.org/gwas3d) is a tool that is helpful in presenting and interpreting
the large amount of data available from GWAS. One way in which GWAS3D helps to present
results is with the Circos-style GWAS visualization, see Figure 4. The most significant
variants are displayed in the outer circle. These variants are linked to genes or genomic
locations shown in the inner circle. Red lines in the center of the circle indicate a epistatic
regulatory interaction with another locus. The thickness of the line represents the intensity of
the interaction (11).
The GWAS3D tool combines different genomic features with comprehensive annotations,
giving an overview of the significance of different genomic regions. The tool also provides a
prioritization algorithm that adds SNPs based on linkage disequilibrium.
In this experiment the analysis was performed in a European American (CEPH) population,
the P-value cutoff level used was 1*10-7, linkage disequilibrium cutoff (R2) of 0.8, binding
site p-value of 1*10-2, and promotor range (up/down) of 500/100 was used. No cell types
restrictions were given.
5
2.2.2 Ensembl BioMart
Ensembl BioMart (http://www.ensembl.org/biomart/) is a data management system that can
be used for advanced data querying. Ensembl BioMart can find genes, sequences etc. in the
Ensembl database based on multiple attributes such as chromosome position or gene list. It
provides a table with a clear overview of all the results. In the filters panel, multiple filters can
be added to define restrictions, like specific genes or chromosome. The attributes panel allows
you obtain specific information about the given restrictions, like Ensembl Gene ID or
Ensembl Transcript ID and many other options available. One way to use this tool is to look
for genes linked with a certain SNP or list of SNPs. In this project we used the SNPs (after
GWAS3D analysis) as a filter. We selected the ‘Ensembl Gene ID’, ‘HGNC-symbol’ and
‘SIFT-score’ as attributes. The SIFT-score estimates if a SNP will be deleterious. The score
ranges between 0 and 1, where a low score indicates a possible deleterious SNP.
Ensembl BioMart was also used to investigate the association between genes (filters) and
their phenotype (attributes: phenotype description) were investigated, to see whether genes
could be linked to type 1 diabetes. For both these investigations the ‘Ensembl Variation 80,
Homo sapiens Short Variants SNPs and Indels GRCh38.2’ database was used.
2.2.3 Cytoscape
Cytoscape (http://www.cytoscape.org/) is a freely available software tool that can be used to
analyse, visualize and integrate networks (17). One of the functions of Cytoscape is to create
networks by importing files and/or tables. Nodes and edges will then have to be set, after
which Cytoscape will create the network. An example to illustrate the meaning of nodes and
edges is the following: ‘T1DM’ could be a node to which genes (the edges) associated to
T1DM are linked. Networks can also be merged together with other networks, giving users
the option to improve interpretation options by advanced visualization. The merging of
networks is illustrated in Figure 2.
Figure 2: Example of the merging function in Cytoscape.

In network work a gene (gene 1 in this example) is associated with a SNP (SNP 1). Gene 1 is involved in a
pathway (pathway 1), shown in network 2. The merging function combines the two network so that the pathway,
gene and SNP are linked. ‘Pathway -, Gene- and SNP 1’ are examples used to demonstrate the merging function
in Cytoscape.
6
In this projects two networks were made in the beginning. The first network consisted of a list
of SNPs linked to the genes. The second network of the pathways and linked genes from
WikiPathways. These two networks were then merged together to form a network with
pathways, genes and SNP. This will be further explained in the materials & methods and
results section.
2.2.3 PathVisio
PathVisio (http://www.pathvisio.org/) is a program that gives researchers the ability the
create, edit, analyse and visualize biological pathways (18). Pathways have been used in
biology for a long time to describe the biological interactions between molecules like genes,
proteins and metabolites. PathVisio gives the possibility to edit pathways or even to create
new ones that can be directly uploaded to WikiPathways. Within a pathway references and
comments can be added, giving the possibility to show a lot of information in a single
overview.
PathVisio can also be used for the visualization of pathways. The visualization option gives
the users the possibility to show experimental data on the pathway diagrams. The data can be
loaded into PathVisio which then can visualize the results on the data nodes and interactions
in the pathway. Different rules for visualization can be used, for example a color gradient can
be used to visualize an activity measurement, whereas a color rule can be used to show
significance levels. Data nodes will split into multiple columns, so all data is visible at the
same time.
Pathway statistics is a tool to find pathways which are (significantly) altered in a dataset. The
user defines a criterion on which the selection should take place (value for activity
measurement, p-value etc). Based on the selected gene list PathVisio performs an over-
representation analysis calculating a Z-score for each pathway. A positive Z-score indicates
that more genes in that pathway meet the criteria than expected based on the complete dataset.
PathVisio can also be extended with plugins. In this project the new RI-plugin for PathVisio
was tested. This plugin links the genes in the pathway to related elements, for example the
known variants for the selected gene. Those variants are then visualized in a side-panel. By
clicking on a gene, information about the linked SNPs and their SIFT-scores could be
obtained. These SIFT-scores were obtained from Ensembl BioMart. Visualizations could then
be performed based on SIFT-scores and p values, making it easier to find genes (with SNPs)
with a potential deleterious effect.
7
2.2.5 GO-analysis
GOrilla (http://cbl-gorilla.cs.technion.ac.il/) is a tool that can be used to identify enriched GO-
terms in a list of genes (19). There are two input options. In the first option a ‘Single ranked
list of genes’ has to be provided. In the second option ‘two unranked lists of genes (target and
background lists)’ have to be provided. Four different ontologies can be chosen: Process,
function, component and ‘all’. The tool then provides researchers with a table of the most
enriched processes/functions/components and provides a schematic overview.
In this project we used all the genes in the WikiPathways collection as a background list and
used all the genes with T1DM associated SNPs as a ‘single ranked list of genes’, so we could
investigate which processes are overrepresented.
2.3 Workflow
In Figure 3 a detailed workflow of the analysis is shown to provide a clear overview of the
steps taken. SNPs associated with T1DM were extracted from the database (step 1). SNPs
were visualized and reprioritized by GWAS3D (step 2) and consequently linked to their genes
using Ensembl BioMart (step 3). A GO-analysis was performed to search for enriched
biological processes (step 4). Genes together with their T1DM associated SNPs were
visualized in a network using Cytoscape (step 5). This network was merged together with a
network of all the pathways from WikiPathways and the genes involved in these pathways.
The merged (‘Merged pathway-gene-SNP network’) network consisted of pathways linked to
genes, which were consequently linked to the SNPs (step 6). Not all genes could be linked to
a pathway from WikiPathways and not all pathways had genes with T1DM associated genes.
These genes and pathways were removed (step 7) to create a new network, called ‘Filtered
pathway-gene SNP network’ (step 8). This network was then analyzed using Excel (step 9).
The genes not present in any pathways were further investigated using DisGeNET (step 10).
Additionally, an effort was made to visualize and analyse the T1DM related SNPs using
PathVisio (step 11).
8
Figure 3: The 11 analysis steps performed in this project to integrate, visualize and analyse genetic
variation data together with biological pathway knowledge.
3. Results
3.1 GWAS3D-analysis
5.352 different SNPs related to T1DM were extracted from the database provided by Johnson
and O'Donnell. The p-values which indicated the associations of the SNPs to T1DM varied
from 1,12E-307 for rs9273363 to 8,11E-04 for rs12124983. These SNPS were reanalyzed
using the GWAS3D SNP analyzing program. GWAS3D detected 190 variants having
regulatory signals, 190 variants causing transcription factor binding site affinity changes, 159
variants affecting long range interactions, 62 variants having a direct effect by GWAS leading
SNPs and 128 variants having an indirect effect by high linkage disequilibrium (LD) of
GWAS leading SNPs. The Circos-style visualization is shown in Figure 4.
9
Figure 4: Circos-style visualization (GWAS3D) of significant SNPs.
SNP ids or loci are displayed in the outer circle, linked to the corresponding genes or genomic location in the
inner circle. Red lines indicate interaction between the linked genes or loci, where the thickness of the line
represents the intensity of the interaction.
Ensembl BioMart
After removing SNPs with a p-value > 1E-07 and after GWAS3D reprioritization, the
remaining SNPs were linked to the affected genes using Ensembl BioMart. These included a
total of 2.183 different SNPs. Eventually 1.521 SNPs of these 2.183 SNPs were matched to
340 different genes. There were 2496 different links between the 1.521 SNPs and 340 genes.
This is possible since a gene can be linked to multiple genes.
SNPs not linked to genes

The 662 SNPs that could not be linked to a gene were analyzed again to be sure that Ensembl
BioMart could not linked them to a gene. For 145 SNPs a matching gene was found and these
were included in the further analysis.
10
Most significant SNP
The SNP which is most significantly associated with T1DM is the rs9273363 (association
with T1DM, p-value: 1,12E-307). This SNP is linked to the HLA-DQB1 gene.
3.2 GO-Analysis
A GO-analysis was performed to search for overrepresented biological processes. All the
genes with T1DM associated SNPs were checked against all genes involved in the pathway
collection from WikiPathways. The most significant processes involved are the antigen
presentation pathway (from exogenous pathogens). Other processes are involved around
leukocytes, lymphocytes, T-cells and B-cells, for instance the ‘positive regulation of T-cell
activation’ process.
Some processes not related to immune function are detected too, like the nucleosome
organization and -assembly. All the processes are shown in Appendix 6.1: Table 1.
3.3 Network analysis

Merged pathway-gene-SNP network
The 2.496 SNPs connected to the 340 genes were used to create a network in Cytoscape. In
this network the SNPs were placed around the genes to which they are linked. This network
was merged together with a network where all the pathways and connected genes from
WikiPathways were shown. This latter network consisted of 284 pathways and 5.703 different
genes. Merging the two networks resulted in a network where the genes had SNPs associated
with T1DM linked to them and the genes were then linked to the pathways in which they are
involved (Figure 5). This resulted in a very complex network with 7.502 nodes and 17.017
edges. For explanation of the merging function see section Materials and Methods 2.2.3 and
Figure 2.
11
Figure 5: Overview of all the pathways, genes and SNPs linked in a single network.
Larger network was made by merging together the gene -SNP and pathway - gene network.
Genes not linked to any pathway

306 genes that have known T1DM associated SNPs, but which could not be linked to any
pathway were examined with DisGeNET to rule out any possible involvement in T1DM. For
two genes DisGeNET found a known link to T1DM; the PTPN22 and ZPF57 genes. These
genes were included in the further analysis.
Filtered pathway-gene-SNP network

To facilitate the interpretation of the network, two filters were applied: (i) genes (and their
associated SNPs) not connected to any of the pathways were removed and (ii) pathways
without genes that are linked to T1DM associated SNPs were removed. 306 different genes
and their SNPs were removed from the network. A total of 210 pathways were removed so 74
pathways remained to which 57 different genes were linked. These genes were connected to
364 diferent SNPs. In the remaining network, the ‘Filtered pathway-gene-SNP network’, it
was found that the CDK2 gene is involved in 16 different pathways, followed by the DAXX
and ITPR1 gene which are involved in 7 and 6 pathways respectively. The genes with the
12
most SNPs were the HLA-DQA2 gene with 43 SNPs, followed by the HLA-DRA gene with 30
SNPs.
The highest percentages of ‘genes (with SNPs) linked to pathway/total genes in pathway’ was
found in the ‘Arylamine metabolism’, where 1 out of 7 genes (14,3%) have T1DM related
SNPs. The second highest ratio is 7,8%, found in the ‘Allograft Rejection’ pathway. The focal
adhesion pathway is the pathway containing the most genes, a total of 211. An overview of all
the pathways, with the linked genes and SNPs is given in appendix 6.2, Table 2.
Figure 6: Overview of all the pathways, genes and SNPs linked in a single network.
Network of pathways with genes with T1DM associated SNPs. The 3 pathways with the most genes with T1DM
associated SNPs and the HLA-DQB1, linked to the SNP with the lowest p-value are indicated.
13
3.4.1. Visualization of SNPs in PathVisio
The new RI-plugin for PathVisio was used to visualize SNPs in a pathway. Information about
the amount of SNPs (and the SNP ids) becomes available in a side panel when selecting a
gene (Figure 7). The SNPs are coloured based on their p-value (left side of the box). All SNPs
of the selected HLA-C gene have a p-value lower than the cut-off value of 1*10-7. The right
side of the box shows the SIFT-score. These SIFT-scores were obtained from BioMart. Blue
means tolerated, whereas white means that no information is available. No deleterious SNPs
were found for this gene.
Figure 7: Visualization of SNPs of the Allograft Rejection pathway in PathVisio.

In the pathway image the HLA-C gene is selected and the back page on the right shows the list of SNPs linked to
this gene. The SNPs are coloured based on the p-value and SIFT-score, below the list a link to the NCBI page of
the SNP and details of the given data are provided. http://wikipathways.org/index.php/Pathway:WP2328
Legend of boxes in right upper corner: left box indicates p-value. Green means p-value < 1*10-7, where with
indicates a value above that. The right half of the box indicates the SIFT-score. Blue means tolerated, white no
information available, red (not shown) means a deleterious SNP.
3.4.2. Pathways linked to T1DM

The ‘Allograft Rejection’, ‘Proteasome degradation’ and ‘Type II interferon signaling’
pathway will be discussed in this section. These pathways were chosen since they contained
the most genes with T1DM associated SNPs.
14
Allograft Rejection
The Allograft Rejection pathway has a total of 204 genes of which 16 have T1DM associated
SNPs. The pathway shows interactions involved in adaptive immune response which are
responsible for the degradation and destruction of an allograft.
The HLA-genes involved in the Allograft Rejection pathway include HLA-DRB4, -DRB1, -
DQB1, -DQA1, -DPA1, -DRB3, -DPB1, -DOA, -DMA, -DMB, -DOB, -DRA, -DRB5 and -
DQA2. For all those genes except HLA-DRB4, -DRB3, -DOA and -DRB5 SNPs are found that
are linked to T1DM. Besides the HLA-genes also the C4A and MICA genes have SNPs related
to T1DM (Figure 7).
Proteasome degradation pathway

7 out of the 135 genes in the ‘Proteasome degradation’ pathway are linked to T1DM related
SNPs. The above mentioned HLA-genes also play an important role in the ‘Proteasome
degradation’. The last step of the pathway is to present the antigens on MHC-class I proteins.
These include the HLA-A, -B, -C, -F and -J genes. PSMB8 and -9 are the two other genes with
T1DM associated SNPs in the ‘Proteasome degradation’ pathway. These two genes belong to
the beta-subunits of the proteasome 20s catalytic core (P45). The pathway is shown in Figure
8.
15
Figure 8: The Proteasome degradation pathway from WikiPathways. The red circles are places around the
genes with T1DM related SNPs. Image adapted from WikiPathways.
http://wikipathways.org/index.php/Pathway:WP183
Type II interferon signaling

In the ‘Type II interferon signaling’ pathway 4 out of 74 genes have T1DM associated SNPs
related to them (Figure 9). These four genes are HIST2H4, HLA-B, TAP1 and PSMB9.
HIST2H4 encodes for the histone H4 protein, one the proteins involved around the formation
of chromatin structure. The TAP1 gene encodes for an ABC transporter, which is able to
transport a variety of molecules across membranes. The PSMB9 protein is one of the 17
subunits of the 20S proteasome complex and the HLA-B is involved in recognition of foreign
antigens.
16
Figure 9: The Type II Interferon Signaling pathway from WikiPathways. The genes with T1DM related
SNPs are marked with a red circle. Image adapted from WikiPathways.
http://wikipathways.org/index.php/Pathway:WP619
4. Discussion
4.1 GWAS analysis
A total of 2.183 different T1DM associated SNPs were extracted from Johnson and
O'Donnell’s database. Eventually 1521 SNPs were matched 2496 times to 340 different genes
using Ensembl BioMart, after GWAS3D analysis. This means that 662 SNPs could not be
linked to any gene. This problem can cause important data to be lost. These 662 SNPs were
entered again in Ensembl BioMart and it was found that 145 of the 662 SNPs could be linked
to Ensembl gene identifiers but not to HGNC symbols.
The SNP with the lowest p-value (rs9273363) can be linked to the HLA-DQB1 gene.
Depending on the haplotype of the SNP, the risk for T1DM can increase or decrease. The A/A
haplotype increases the risk, while the people with A/C and C/C haplotypes have a 0,87 and
0,15 time lower risk, respectively (20).
The HLA-genes are already classified as important loci in T1DM as described by Nejentsev et
al. In their association analysis they identified 1.475 SNPs linked to the genes of the MHC-
region (21).
17
4.2 GO-Analysis
The GO-analysis revealed that most of the genes with SNPs linked to T1DM are related to the
immune system and function. Torkamani et al. performed a pathway analysis of the seven
diseases assessed by the WTCCC (22). Most of the processes found for T1DM are also
related to the immune system, like the ‘IL2 activation and signaling pathway’. They also
found two pathway involved in signal transduction (‘Calcium signaling’ and ‘IP3 signaling’)
and several pathway involved angiotensin, AKT and ERK. Differences in results can be
explained by the use of MetaCore by Torkamani et al., while other approaches were used in
this thesis. Although some differences are found, both experiments found immune-related
processes. Since GO-classes are much more generic than pathways, this might also explain
some of the differences. The nucleosome organization and assembly may show up since
chromatin modification is critical for efficient transcription of the MHC-class II proteins (23).
4.3 Network analysis

WikiPathways comprises a total of 284 curated pathways for Homo Sapiens. These pathways
contain a total of 5.916 different genes. It is clear that large amounts of knowledge is not (yet)
incorporated in these pathways, causing relevant genes to be missing in the pathway analysis.
The size of this problem might have been reduced by integrating multiple pathway sources,
for example KEGG or Reactome, although the overlap in gene count is quite substantial. The
‘Merged Pathway-gene-SNP network’ had multiple genes that could not be connected to any
pathway. Ensembl BioMart was used to investigate the phenotype descriptions associated
with these genes. Numerous genes missed phenotype descriptions and therefore DisGeNET-
analysis was performed. DisGeNET and Ensembl BioMart indicated the association of the
ZFP57 gene with T1DM. The ZFP57 gene is more associated with another form of diabetes,
called the ’Diabetes Mellitus Transient Neonatal 1’ (24).
The pathway- and network analysis approach was found to be useful to examine in the
investigation of SNPs and to see which pathways are involved in the genetic susceptibility to
T1DM. Numerous loci have been found to be associated to T1DM and a part of these were
validated by this novel approach.

The visualization of SNPs in PathVisio was not meant to generate new data but to test a
method to interactively visualize SNPs directly in the pathway. The Allograft Rejection
pathway was visualization based on p-values and SIFT-scores. These SIFT-scores indicated
18
no deleterious SNPs. One of the problems encountered was the fact that we did not succeed in
linking more than one transcript per gene. A method to take these transcripts into account is
necessary, since SIFT-scores between these transcripts can vary significantly. This
visualization shows that there are possibilities to visualize SNPs in PathVisio, which facilitate
the interpretation of SNPs in pathways.
Allograft rejection and diabetes

The ‘Allograft Rejection’ pathway is the pathway which has the highest amount of genes with
T1DM related SNPs. These genes mainly belong to the MHC/HLA-class genes. Carbonetto
and Stephens conducted an experiment in which the data from the WTCCC, which is also
included in this experiment, was analyzed. They found the ‘Allograft Rejection’ and ‘Asthma’
pathway as the top pathways in rheumatoid arthritis and T1DM (25).
In the ‘Allograft Rejection’ pathway antigen presenting cells from the donor or the recipient
(direct and indirect pathway, resp.) activate naive T-cells leading to CD8+ and CD4+ T cell
activation. The MHC/HLA-proteins help in the process of presenting the antigens to the T-
cells. The activation of T-cells ultimately leads to destruction and apoptosis of the allograft
cells (Figure 7). Cytotoxic T-lymphocyte-associated protein 4 (CTLA4) acts as an inhibitor of
the inflammatory response, but no SNPs linked to this gene were found in the analysis of the
GWAS data (26). In both the Asthma and Allograft Rejection pathway, multiple MHC-genes
are involved, explaining the involvement in T1DM. Unfortunately the ‘asthma’ pathway was
not included in the WikiPathways collection.
Articles on the association of the HLA-genes with T1DM go as far back as 1970 (27). In 1996
Noble JA described the role of the HLA-class II genes, especially in the HLA-DRB1, -DQA1,
-DQB1, and -DPB1 genotypes (28). The role of these genes in T1DM are confirmed in this
experiment. The similarity between ‘Allograft Rejection’ and T1DM is the immune reaction
against tissue cells. In the rejection of an allograft the allograft is seen as a foreign object and
attacked by the immune system, whereas in T1DM the host’s own beta-cells are attacked. It
was therefore expected that this pathway would show up, since similar processes are involved
in T1DM.
Proteolysis and diabetes

We found several models around the involvement of the proteolysis in T1DM. An interesting
model of the involvement of proteolysis in the development of T1DM is proposed by
Fierabracci (29). Fierabracci’s model starts with an environmental influence, like a viral
19
infection, causing breakdown of self-proteins in the beta-cells. Pieces of the protein fragments
that remain are shown to CD8+ T-cells by the antigen presenting capacities of the MHC-class
I molecules. This T-cells activation will lead to lysis of the beta-cells. By presentation of
these antigens to B-cells, autoantibodies are produced to attack the beta-cells. An extensive
description of this hypothesis is given in ‘The putative role of proteolytic pathways in the
pathogenesis of Type 1 diabetes mellitus: The ‘autophagy’ hypothesis’ (29).
Balasubramanyam M et al. described a different aspect of the proteolysis and its involvement
in T1DM (30). The ubiquitin-proteasome system is one the systems involved in the
degradation of cellular proteins and is involved in numerous biological functions. One of
these biological functions is the insulin signaling (30). Insulin promotes cellular growth not
only by increasing synthesis of some proteins, but mainly by the inhibition of overall
proteolysis (31, 32). Experimental evidence shows that this inhibition of overall proteolysis
comes from the inhibition of the ATP-and ubiquitin-dependent degradation by the
proteasomes in vitro as well as in cultured cells (33). Contrasting evidence shows that
prolonged exposure of cells to insulin causes IRS-1 ubiquitin conjugation and consequently
degradation of IRS-1 (34, 35). In diabetes, inflammatory mediators could upregulate SOCS
proteins (suppressor of cytokine signaling proteins), especially SOCS3 could be upregulated.
IRS-1 and -2 could then be degraded by proteasomes, causing insulin resistance. These SOCS
proteins and the processes involved could play a role in T1DM, although the exact
mechanisms are still unclear (33). In addition to the ubiquitin/proteasome system the
proteolytic autophagic pathway may also play a role in the autoimmune processes. Autophagy
could be involved in the autoimmune process within the MHC class I & II self-antigen
presentation. Components of the ubiquitin/proteasome system may even be shared with the
autophagy process (29). This information suggest that we can confirm the role of the
‘Proteasome Degradation’ pathway in T1DM.
Type II interferon signaling

Inflammation is an important process involved in T1DM. Type II interferon signaling is a
pathway involved in the inflammatory response. The four genes with T1DM associated SNPs
in the ‘Type II interferon signaling’ pathway are the HIST2H4, HLA-B, TAP1 and PSMB9
gene. TAP 1 has been reported to be associated with T1DM (36). Together with the HLA-B
gene, among others, the TAP and MHC-complex work together on the antigen presentation to
T-cells (21, 36). The GO-Analysis performed also showed an overrepresentation of the
interferon signaling and the involvement of TAP proteins.
20
4.5 Comparing results with other studies
We compared our results with multiple more recent publication to investigate the role of
several genes associated with T1DM. Hakonarson et al. found in 2007 a significant
association with variations in the gene region that contains the KIAA0350 gene (9). This
regions has not been found in the GWA studies used in this analysis.
The PTPN22 gene is frequently associated with T1DM. This gene encodes the lymphoid
protein tyrosine phosphatase (37, 38). A SNP in the gene could contribute to the development
of T1DM by the negative regulation of T-cell activation, demonstration the important role of
T-cells in T1DM (39). Unfortunately, the PTPN22 is not present in any of the pathways in
the WikiPathways collection, and so it was ‘lost’ in the ‘Filtered pathway-gene-SNP
network’. However, SNPs related to PTPN22 were found in the GWA studies and therefore
should nonetheless be considered as an important factor in T1DM.
Cytotoxic T-lymphocyte-associated protein 4 (CTLA4) is a protein on the surface of activated
T-cells, producing an inhibitory signal regarding the T-cell activation. Changes in the gene
expression in the CTLA4 gene, caused by possible variations in one the regulators may
increase T-cell self-reactivity, indicating its possible link to T1DM (26, 40). No SNP related
to the CTLA4 gene was found in the database, or the SNP could not be matched to the CTLA4
gene using Ensembl BioMart, explaining the absences of the gene in the network .
In the experiment conducted by Carbonetto, as described above in section ‘Allograft
Rejection’ (discussion section), the pathway with the higher Bayes factor was the IL-2
signaling pathway. This pathway is missing in the final network, since none of these genes
had T1DM associated SNPs linked to them. The IL-2 cytokine is involved in the activation,
development and maintenance of T-regulatory cells. Defects in these pathways are involved
in autoimmune disorders (25).
Evangelou M et al. did a pathway analysis in which they started with SNPs and looked for the
pathway involved in T1DM (41). A major difference in their approach compared to this thesis
is that Evangelou et al. removed the MHC-loci in the analysis, so these loci could not bias
towards pathway in which multiple MHC-loci are involved. The most important pathways
they found were also involved in the immune response. Since they used a different approach
and different databases for the pathways (Reactome and BioCarta), the pathway did differ
from the pathways found in this thesis.
21
4.6 Shortcomings in this study
What must be taken into account is the fact that the database from Johnson and O'Donnell
was established in 2009 and the articles from Hakonarson and the WTCCC were both
published in 2007. Therefore SNPs found more recently by other researchers may not be
included. Besides this, only 50% of the protein coding genes are in the pathways since the
detailed mechanisms and functions are often not known. Ensembl BioMart not being able to
link some SNPs to genes may explain incomplete networks.
4.7 Conclusion
We performed a complex interactive network approach to investigate the following research
question: “Can we find the disease related processes by integrating and visualizing the
genetic variants with existing pathway knowledge?”
We found several pathways and biological processes to be involved in T1DM. The MHC-
region was already known to be important in T1DM, and our pathway- and network based
approaches confirmed this finding, showing several relevant pathways containing multiple
HLA-genes. This result shows the importance of immune related pathways in T1DM.
The method applied is able to confirm existing knowledge and gives possibilities to search
new loci and pathways involved in the disease process. One of the difficulties encountered
was that the ‘Merged Pathway-gene-SNP network’ was too big to interpret visually. In the
‘Filtered Pathway-gene-SNP network’ this problem was resolved. In this smaller network the
pathways, gene, SNPs and their connections could be distinguished based on the used
visualization. Future research could focus on improvements in linking SNPs to genes and
genes to pathways, thereby possibly expanding the knowledge on the disease process. This
could eventually lead to improvements in the diagnosis or treatment of T1DM patients.
22
5. References
1. Van Belle TL, Coppieters KT, Von Herrath MG. Type 1 diabetes: etiology,
immunology, and therapeutic strategies. Physiol Rev. 2011;91(1):79-118.
2. Statistiek CB. Steeds meer mensen met diabetes 2014. Available from:
http://www.cbs.nl/nl-NL/menu/themas/gezondheid-
welzijn/publicaties/artikelen/archief/2014/2014-4173-wm.htm.
3. Frayn KN. Metabolic Regulation. A Human Perspective. Oxford (UK): Wiley -

Blackwell; 2010.
4. Guyton AC, Hall JE. Textbook of Medical Physiology. Philadelphia: Elsevier
Saunders; 2010.
5. Goossens GH. The role of adipose tissue dysfunction in the pathogenesis of obesity-
related insulin resistance. Phsyiol Behav. May 2008;94(2):206-18.
6. Fierabracci A. Unravelling the role of infectious agents in the pathogenesis of human
autoimmunity: the hypothesis of the retroviral involvement revisited
. Curr Mol Med. 2009;20099:1024-33.
7. Fierabracci A, Ayroldi E. Experimental strategies in autoimmunity: antagonists of
cytokines and their receptors, nanocarriers, inhibitors of immunoproteasome, leukocyte
migration and protein kinases
. Curr Pharm Des. 2011;17:3094-107.
8. Bottini N, Vang T, Cucca F, Musteling T. Role of PTPN22 in type 1 diabetes and
other autoimmune diseases. Seminars in Immunology. 2006;18(4):207-13.
9. Hakonarson H. A genome-wide association study identifies KIAA0350 as a type 1
diabetes gene. Nature. August 2007;448:591-4.
10. Barillot E, et al. Computational Systems Biology of Cancer: Taylor & Francis Group;
2012.
11. Li MJ, Wang LY, Xia Z, Sham PC, Wang J. GWAS3D: detecting human regulatory
variants by integrative analysis of genome-wide associations, chromosome interactions and
histone modifications. Nucleic Acids Res. 2013.
12. Johnson AD, O'Donnell CJ. An Open Access Database of Genome-wide Association
Results. BMC Medical Genetics. 2009;10(6).
13. The Wellcome Trust Case Control Consortium. Genome-wide association study of
14.000 cases of seven common diseases and 3.000 shared controls. Nature. June
2007;447(7145):661-78.
23
14. Hakonarson H. A genome-wide association study identifies KIAA0350 as a type 1
diabetes gene. Nature 448. August 2007:591-4.
15. Kelder T, van Iersel MP, Hanspers K, Kutmon M, Conklin BR, Evelo C, et al.
WikiPathways: building research communities on biological pathways. Nucleic Acids Res.
2011:1301-7.
16. Pinero J, al. e. DisGeNET: a discovery platform for the dynamical exploration of
human diseases and their genes. Database (Oxford). 2015.
17. Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, et al. Cytoscape: A
Software Environment for Integrated Models of Biomolecular Interaction Networks. Genome
Res. 2003;13(11):2498–504.
18. Kutmon M, Van Iersel MP, Bohler A, Kelder T, Nunes N, Pico AR, et al. PathVisio 3:
An Extendable Pathway Analysis Toolbox. PLoS Comput Biol. 2015;11(2).
19. Eden E, Navon R, Steinfeld I, Lipson D, Yakhini Z. GOrilla: A Tool For Discovery
And Visualization of Enriched GO Terms in Ranked Gene Lists. BMC Bioinformatics.
2009;10(48).
20. Cariaso M, Lennon G. SNPedia: a wiki supporting personal genome annotation,
interpretation and analysis. Nucleic Acids Res. 2012:1308-12.
21. Nejentsev S, et al. Localization of type 1 diabetes susceptibility to the MHC class I
genes HLA-B and HLA-A. Nature. 2007;450(7171):887-92.
22. Torkamani A, Topol EJ, Schork NJ. Pathway analysis of seven common diseases
assed by genome-wide association. Genomics. 2008;92:265-72.
23. Choi NM, et al. Regulation of major histocompatibility complex class II genes. Curr
Opin Immunol. 2011;23(1):81-7.
24. Baglivo L, et al. Genetic and epigenetic mutations affect the DNA binding capability
of human ZFP57 in transient neonatal diabetes type 1. FEBS Lett. 2013;587(10):1474-81.
25. Carbonetto P, Stephens M. Integrated Enrichment Analysis of Variants and Pathways
in Genome-Wide Association Studies Indicates Central Role for IL-2 Signaling Genes in
Type 1 Diabetes, and Cytokine Signaling Genes in Crohn's Disease. PloS Genetics. 3 October
2013.
26. Ueda H, et al. Association of the T-cell regulatory gene CTLA4 with susceptibility to
autoimmune disease. Nature 423. 2003:506-11.
27. Noble JA. Genetics of Type 1 Diabetes. Cold Spring Harb Perspect Med. 2012;2.
28. Noble JA, et al. The role of HLA class II genes in insulin-dependent diabetes mellitus:
molecular analysis of 180 Caucasian, multiplex families. Am J Hum Genet 59. 1996:1134-48.
24
29. Fierabracci A. The putative role of proteolytic pathways in the pathogenesis of Type 1
diabetes mellitus: The ‘autophagy’ hypothesis. Medical Hypotheses. 2014;82(5):553-7.
30. Balasubramanyam M, Sampathkumar R, Mohan V. Is insulin signaling molecules
misguided in diabetes for ubiquitin–proteasome mediated degradation? Molecular and
Cellular Biochemistry. 2005;275:117-25.
31. Fryburg DA, Jahn LA, Hill SA, Oliveras DM, Barrett EJ. Insulin and insulin-like
growth factor-I enhance human skeletal muscle protein anabolism during
hyperaminoacidemia by different mechanisms. Journal of Clinical Investigation.
1995;96:1722-9.
32. Russell-Jones DL, Umpleby M. Protein anabolic action of insulin,
growth hormone and insulin-like growth factor I. Eur J Endocrinol. 1996;135:631-42.
33. Bennett RG, Hamel FG, Duckworth WC. Insulin inhibits the ubiquitin dependent
degrading activity of the 26S proteasome. Endocrinology. 2000;141:2508-17.
34. Sun XJ, Goldberg JL, Qiao LY, Mitchell JJ. Insulin-induced insulin receptor substrate-
1 degradation is mediated by the proteasome degradation pathway. Diabetes. 1999;48:1359-
64.
35. Zhande R, Mitchell JJ, Wu J, Sun XJ. Molecular mechanism of insulin-induced
degredation of insulin receptor substrate 1. Mol Cell Biol. 2002;22:1016-26.
36. Sia C, Weinem M. Genetic susceptibility to type 1 diabetes in the intracellular
pathway of antigen processing - a subject review and cross-study comparison. Rev Diabet
Stud. 2005;2(1):40-52.
37. Hirschhorn JN. Genetic epidemiology of type 1 diabetes. Pedriatr Diabetes.
2003;4:87-100.
38. Maier LM, Wicker LS. Genetic susceptibility to type 1 diabetes. Curr Opin Immunol.
2005;17:601-8.
39. Mehers KL, Gillespie KM. The genetic basis for type 1 diabetes. Br Med Bull.
2008;88(1):115-29.
40. Kristiansen OP, Larsen ZM, Pociot F. CTLA-4 in autoimmune diseases—a general
susceptibility gene to autoimmunity? Genes Immun. 2000;1:170-84.
41. M E. A Method for Gene-Based Pathway Analysis Using Genomewide Association
Study Summary Statistics Reveals Nine New Type 1 Diabetes Associations. Genet
Epidemiol. 2014;38(8):661-70.
25
6. Appendix
6.1 Results of GO-analysis
Table 1: The enriched processes found by GO-Analysis. A description of the process, together with the P-value,
FDR q-value and enrichtment values are given.
26
6.2 Overview of pathways – genes – SNPs in ‘Filtered pathway-gene-SNP
network’
Table 2: Overview of all the pathways which have genes with T1DM associated SNPs linked.
The left column shows the pathway, the middle one the HGNC-symbols of the genes and the right column
indicates which SNPs are linked to the gene. In case there is no SNP indicates, that means that the genes is also
mentioned in another pathway and the SNPs are indicated there.
Pathway HGNC SNPs linked to gene

Symbol
AGE/RAGE AGER rs3134945, rs3134943, rs3134947, rs3134946, rs1800684,
pathway rs3132965, rs3096689
Allograft C4A rs391165, rs389512
Rejection
HLA-A rs3823343
HLA-B rs9378249, rs3819294, rs2523612
HLA-C rs9468933, rs9468929, rs7382307, rs3819294, rs9378249,
rs7751729, rs2844613, rs2524078, rs3134792, rs3134745,
rs2523612, rs2248902, rs2524069, rs2524067, rs13208617,
rs13207315, rs2248880, rs13216197, rs13200073, rs13191519,
rs13203895, rs13200571, rs10484554, rs10456057, rs13191343,
rs12191877
HLA- rs209473
DMA
HLA- rs3132131
DMB
HLA- rs2857124, rs2857118, rs2857135, rs2071474, rs2071472,
DOB rs2621338, rs2621321, rs2071470
HLA- rs9277380
DPA1
27
HLA- rs9277380, rs3117224, rs9277477, rs2179920
DPB1
HLA- rs9272346, rs9271775, rs9272775, rs17843593, rs6927022,
DQA1 rs3129768
HLA- rs9276395, rs9276394, rs9276401, rs9276364, rs9276357,
DQA2 rs9276375, rs9276370, rs9276319, rs9276313, rs9276351,
rs9276348, rs7773694, rs7773441, rs9276311, rs7773955,
rs7773068, rs7755597, rs7773407, rs7773149, rs6457644,
rs6457643, rs7755596, rs7744593, rs2051600, rs13214143,
rs5021448, rs2894287, rs13199553, rs12183007, rs13214069,
rs13199787, rs10947336
HLA- rs9273349, rs3134996, rs9273363, rs1063355
DQB1
HLA- rs983561, rs9268657, rs9268645, rs9268659, rs9268658,
DRA rs9268615, rs8084, rs9268641, rs9268634, rs3135394,
rs3135393, rs5000563, rs4612206, rs3129887, rs3129881,
rs3135392, rs3135391, rs3129876, rs3129875, rs3129878,
rs3129877, rs2395181, rs2395180, rs3129872, rs3129868,
rs2395176, rs2395174, rs2395179, rs2395177, rs14004
HLA- rs35464393, rs35366052, rs701831, rs28490179
DRB1
HLA-F
MICA rs3132472, rs3094584, rs9501387, rs2596542, rs2596541,
rs2857282, rs2844521, rs2523459, rs2523453, rs2596540,
rs2523467, rs2428474, rs2256328, rs2523452, rs2428475,
rs2251731, rs1063635, rs2256175, rs2256028, rs1051790
Alzheimers ITPR1 rs3805006
Disease
Amyotrophic DAXX rs3130099, rs3130018, rs3130100, rs2073525, rs2073524,
lateral sclerosis rs3106189, rs2395379, rs1061783
(ALS)
Androgen DAXX
receptor
signaling
pathway
Apoptosis DAXX
Modulation and
Signaling
HSPA1B rs2763979, rs2471980, rs3115674
Apoptosis HSPA1B
Modulation by
HSP70
Apoptosis- ERBB3 rs7971751, rs4759229, rs877636
related network
due to altered
Notch3 in
ovarian cancer
IER3 rs3130660, rs2284174, rs1059612
Arrhythmogenic CACNG1 rs3785579, rs1799938, rs7210865, rs16960487, rs16960501,
28
Right rs16960497
Ventricular
Cardiomyopath
y
TCF7L2 rs7904519, rs7900150, rs7924080, rs7077247, rs7077039,
rs7895340, rs7100927, rs6585201, rs6585200, rs7071302,
rs6585202, rs6585197, rs4074720, rs6585199, rs6585198,
rs12359102, rs12265291, rs4074718, rs12718338, rs11196205,
rs11196200, rs12258200, rs11196208, rs10885405, rs10885402,
rs10885409, rs10885406, rs10787472
Aryl CDK2 rs773107
Hydrocarbon
Receptor
Arylamine SULT1A2 rs11401
metabolism
ATM Signaling CDK2
Pathway
MDC1 rs3132584
Calcium ITPR1
Regulation in
the Cardiac Cell
Cardiac POU5F1 rs887468, rs887465, rs3130457, rs1265159, rs887464,
Progenitor rs6929434, rs1150765
Differentiation
Cell Cycle CDK2
Complement C2 rs638383, rs558702, rs644045, rs497309, rs3130683, rs519417,
and Coagulation rs512559, rs3128761
Cascades
CFB rs512559, rs2072633, rs1270942
Cytokines and HLA-
Inflammatory DRA
Response
HLA-
DRB1
Cytoplasmic RPS26 rs1131017, rs705704
Ribosomal
Proteins
Diurnally HIST1H2 rs201005, rs200995, rs200989, rs200981, rs200991, rs200990,
Regulated BN rs200948, rs17763089, rs200953, rs200949, rs13199772,
Genes with rs13194781, rs17695758, rs13199906
Circadian
Orthologs
HLA-
DMA
DNA Damage CDK2
Response
DNA Damage TCF7L2
Response (only
ATM
29
dependent)
DNA CDK2
Replication
ErbB Signaling ERBB3
Pathway
NRG3 rs11818231, rs11816685, rs17101073, rs17095600, rs11815363
Eukaryotic GTF2H4 rs886420, rs1264308, rs1264304, rs1264312, rs1264310
Transcription
Initiation
FAS pathway DAXX
and Stress
induction of
HSP regulation
Fatty Acid Beta ACSL1 rs13112568, rs13126272, rs13120078
Oxidation
Fatty Acid ACSL1
Biosynthesis
Focal Adhesion TNXB rs1150752, rs393544, rs3134954, rs433061, rs3130285,
rs3117189, rs3130342, rs3130287, rs3096695, rs204895,
rs3117182, rs3117181, rs204885, rs204879, rs204890,
rs204889, rs1269852, rs1150753, rs204878, rs1269854
G Protein ITPR1
Signaling
Pathways
G1 to S cell ATF6B rs393544, rs3134954, rs3117182, rs3117181, rs3130342,
cycle control rs3130288, rs204894, rs204892, rs3096695, rs204895,
rs1269854, rs1269852, rs204890, rs204889, rs1269851,
rs1150752
CDK2
Ganglio B3GALT4 rs464865, rs463260, rs469064, rs446735, rs462093, rs455567
Sphingolipid
Metabolism
Gastric Cancer HIST1H4J rs200502
Network 1
Heart ERBB3
Development
Histone EHMT2 rs558702
Modifications
HIST1H3J rs200977
HIST1H4J
ID signaling CDK2
pathway
IL-5 Signaling SPRED1
Pathway
Insulin FLOT1 rs8233, rs3095329, rs3094127, rs3130660, rs3095330,
Signaling rs1059612, rs2284174, rs1064627
Integrated CDK2
Breast Cancer
Pathway
30
Integrated CDK2
Cancer pathway
Integrated CDK2
Pancreatic
Cancer Pathway
DAXX
Iron uptake and ATP6V1G rs2857607
transport 2
SLC17A3 rs555460, rs548987, rs726836, rs629444, rs1324088,
rs1324087, rs523383, rs501220, rs1165168, rs1184498,
rs1177441
MAPK DAXX
Signaling
Pathway
HSPA1B
Mesodermal POU5F1
Commitment
Pathway
Metapathway SULT1A2
biotransformati
on
miR-targeted ATAT1 rs9262135, rs9262130
genes in
lymphocytes -
TarBase
EHMT2
miR-targeted EHMT2
genes in muscle
cell - TarBase
ERBB3
miRNA CDK2
Regulation of
DNA Damage
Response
miRNA targets TNXB
in ECM and
membrane
receptors
Mitochondrial ACSL1
LC-Fatty Acid
Beta-Oxidation
mRNA DHX16 rs9262135, rs9262141
Processing
Myometrial ATF6B
Relaxation and
Contraction
Pathways
ITPR1
Neural Crest NOTCH4 rs3132946, rs3132940, rs3134942, rs3132956, rs3096690,
31
Differentiation rs2071278, rs3131296, rs3131294, rs204989, rs204987,
rs204991, rs204990, rs1044506
NOTCH4
NOTCH4
NRF2 pathway AGER
Oncostatin M CDK2
Signaling
Pathway
Ovarian MSH5 rs707915, rs3132445, rs707938, rs3131378, rs3130484,
Infertility Genes rs3131383, rs3131379, rs3117574, rs3115672, rs3117577,
rs3117575, rs1150793, rs1144708, rs3115671, rs3101018
p38 MAPK DAXX
Signaling
Pathway
Parkin- HSPA1B
Ubiquitin
Proteasomal
System pathway
TUBB rs3095330, rs3095329, rs8233, rs3132584, rs3094127
Pathogenic TUBB
Escherichia coli
infection
Proteasome HLA-A
Degradation
HLA-B
HLA-C
HLA-F rs929158, rs885942, rs929160, rs1633106, rs1632957,
rs1736915, rs1736913, rs1610608, rs1610602, rs1628578,
rs1611350, rs1610601, rs1419696
HLA-J rs356969, rs356968
PSMB8 rs4148882, rs241427
PSMB9 rs4148882
RB in Cancer CDK2
Regulation of FGF14 rs12708382
Actin
Cytoskeleton
Regulation of SPRED1
Microtubule
Cytoskeleton
Serotonin ITPR1
Receptor 2 and
ELK-
SRF/GATA4
signaling
SIDS C4A
Susceptibility
Pathways
POU5F1
32
Signaling CDK2
Pathways in
Glioblastoma
ERBB3
Spinal Cord AIF1 rs2857600, rs2736177, rs3763295, rs2736176, rs2269475
Injury
CDK2
Sulfation SULT1A2
Biotransformati
on Reaction
TCR Signaling ITPR1
Pathway
Triacylglyceride AGPAT1 rs3134947, rs3134943, rs3132965, rs3134946, rs3134945,
Synthesis rs3130347, rs3130284, rs3131297, rs3130348, rs3096689,
rs2849013, rs3130283, rs3096697, rs1269839
TSH signaling CDK2
pathway
Type II HIST1H4J
interferon
signaling
(IFNG)
HLA-B
PSMB9
TAP1 rs4148882
Wnt Signaling POU5F1
Pathway and
Pluripotency
TCF7L2
Wnt Signaling TCF7L2
Pathway
Netpath
33
Add-on page: Description of work by student Bachelor BMW
This add-on page to the Bachelor thesis BMW provides details on the role of the student
in the experiments, data collection, and analyses described in the thesis.
Experiments and measurements
Please describe here which wet-lab experiments, clinical measurements, or other

experiments and/or measurements were conducted by the student:
No measurements were performed.

[Extend the box if needed]
Provenance of data
Please describe here, for all data(sets) described in the results, whether it was generated
by the student, generated by others during the project (e.g. project members, staff,
other students), previously generated or existing in the lab, or taken from public
resources:
The SNPs were extracted from a free accessible database online. A reference to the
database is given in the thesis.
The WikiPathways file (pathways + their genes) was provided by the thesis supervisor.
The DisGeNET script was obtained by contacting the developers.
Data analysis and interpretation

Please describe here which data analyses and interpretations were conducted by the
student for the data(sets) described in the previous box:
SNPs were linked to their genes using Ensembl BioMart. These genes were then used
together with the WikiPathways file in order to create a network (pathway-gene-SNP).
Genes not linked to a pathway were removed in Cytoscape. The resulting network was
interpreted by the student with the aid of Excel (sorting Pathways, sorting genes etc).
Pathway to use in PathVisio was downloaded from WikiPathways.
The DisGeNET and GO-Analysis results were visually interpreted by the student.
Integration with other analytical results
34
Please describe here which other analytical results were integrated with those of the
analysis performed by the student, if any:
Remarks
Visualization of SNPS in PathVisio was performed together with the supervisor and co-
supervisor of this project, Martina Summer -Kutmon and Elisa Cirillo resp. This technique
is new and was not meant to create data for this thesis.
35

GWAS Data Analysis of Type 1 Diabetes Using Pathway and Network Approaches

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

GWAS Data Analysis of Type 1 Diabetes Using Pathway and Network Approaches

Hochgeladen von

Copyright:

Verfügbare Formate

Investigation of GWAS data on Type 1

Diabetes Mellitus using pathway and

2. Materials and methods

Add-on page: Description of work by student Bachelor BMW ............................................... 34

1.2 Disease causes

1.3 Genome wide association studies

1.4 Research aim

2.1 Data resources

Figure 2: Example of the merging function in Cytoscape.

SNPs not linked to genes

3.3 Network analysis

Genes not linked to any pathway

Filtered pathway-gene-SNP network

Figure 7: Visualization of SNPs of the Allograft Rejection pathway in PathVisio.

3.4.2. Pathways linked to T1DM

Proteasome degradation pathway

Type II interferon signaling

4.3 Network analysis

4.4 PathVisio visualization of SNPs

Allograft rejection and diabetes

Proteolysis and diabetes

Type II interferon signaling

3. Frayn KN. Metabolic Regulation. A Human Perspective. Oxford (UK): Wiley -

Pathway HGNC SNPs linked to gene

Experiments and measurements

Please describe here which wet-lab experiments, clinical measurements, or other

No measurements were performed.

Data analysis and interpretation

Integration with other analytical results

Das könnte Ihnen auch gefallen