Beruflich Dokumente
Kultur Dokumente
T.J.E. Hooimeijer
I6049728
Biomedical Sciences
Direction Biological Health Sciences
Bachelor thesis
Supervised by: Martina Summer-Kutmon & Elisa Cirillo
Internship period: 20-04-2015 – 03-07-2015
Maastricht University
Date: 02-07-2015
Contents
Abstract
1. Introduction
1.1 Type 1 Diabetes Mellitus .................................................................................................. 1
1.2 Disease causes ................................................................................................................... 2
1.3 Genome wide association studies ...................................................................................... 2
1.4 Research aim ..................................................................................................................... 3
3. Results
3.1 GWAS3D-analysis ............................................................................................................ 9
3.2 GO-Analysis .................................................................................................................... 11
3.3 Network analysis ............................................................................................................. 11
3.4 PathVisio visualization of SNPs
3.4.1. Visualization of SNPs in PathVisio ......................................................................... 14
3.4.2. Pathways linked to T1DM........................................................................................ 18
4. Discussion
4.1 GWAS analysis ............................................................................................................... 17
4.2 GO-Analysis .................................................................................................................... 18
4.3 Network analysis ............................................................................................................. 18
4.4 PathVisio visualization of SNPs ..................................................................................... 18
4.5 Comparing results with other studies .............................................................................. 21
4.6 Shortcomings in this study .............................................................................................. 22
4.7 Conclusion ....................................................................................................................... 22
5. References.............................................................................................................................23
6. Appendix...............................................................................................................................26
1
Figure 1: The differences between T1DM and T2DM.
In both types of diabetes, cells do not absorb glucose. In T1DM patients there is not enough insulin produced
due to the lack of pancreatic beta-cells, which are destroyed by an autoimmune reaction. The insulin is needed to
absorb the glucose. In T2DM patients the cells do not respond to the produced insulin, since the cells have
become insulin resistant. Both diseases lead to a high blood sugar level also called hyperglycemia.
2
and more loci are found that may be involved in disease development. Several studies found
genes and loci related to T1DM, including genes involved in immune response, like the major
histocompatibility complex (MHC) genes, the protein tyrosine phosphatase-22 (PTPN22)
gene and the inhibitory T-cell signaling molecule producing cytotoxic T-lymphocyte-
associated protein 4 (CTLA4) gene (9). The identification of the genes and loci involved in
T1DM may help to better understand the disease mechanisms, and might be useful for early
diagnostics and development of better treatment options.
3
2. Materials and methods
Section 2.1 the different data resources that are used in the analysis are explained, in section
2.2. the tools and section 2.3. will give a detailed workflow of the analysis
2.1.2 WikiPathways
WikiPathways (http://www.wikipathways.org) is an open online database for biological
pathways that is used for sharing and publishing pathway knowledge (15). Hyperlinks to
literature references and/or information pages on a gene or protein can be added to the
pathway diagram to include as much information as possible. WikiPathways is not only a
convenient tool to study human biology, also pathway data for 20 other species are included.
Each pathway has its own wiki page where the pathway diagram, the description, the
4
references and a list of pathway elements, including gene products, metabolites and proteins,
are available. Pathways can be edited by using the pathway editor. A version history for each
change in the pathway is also kept (15). Pathways can also be downloaded and used in
advanced analysis programs like PathVisio. PathVisio is described in section 2.2.3.
A list of all the pathways and linked genes from WikiPathways was provided by Bram van
Steen.
2.1.3. DisGeNET-analysis
DisGeNET (http://disgenet.org/web/DisGeNET/menu) is an open access databases that
combines different sources to link genes to diseases (16). DisGeNET was used to check for
gene-disease associations of the genes in our study that were not present in any of the
pathways.
A script was made to convert the Entrez gene ids -associated to diseases, to the Ensembl gene
ids - disease associations. In Cytoscape a network was made in which genes were linked to
the disease(s) to which they are associated. This network was analyzed manually.
2.2 Tools
2.2.1 GWAS3D
GWAS3D (http://jjwanglab.org/gwas3d) is a tool that is helpful in presenting and interpreting
the large amount of data available from GWAS. One way in which GWAS3D helps to present
results is with the Circos-style GWAS visualization, see Figure 4. The most significant
variants are displayed in the outer circle. These variants are linked to genes or genomic
locations shown in the inner circle. Red lines in the center of the circle indicate a epistatic
regulatory interaction with another locus. The thickness of the line represents the intensity of
the interaction (11).
The GWAS3D tool combines different genomic features with comprehensive annotations,
giving an overview of the significance of different genomic regions. The tool also provides a
prioritization algorithm that adds SNPs based on linkage disequilibrium.
In this experiment the analysis was performed in a European American (CEPH) population,
the P-value cutoff level used was 1*10-7, linkage disequilibrium cutoff (R2) of 0.8, binding
site p-value of 1*10-2, and promotor range (up/down) of 500/100 was used. No cell types
restrictions were given.
5
2.2.2 Ensembl BioMart
Ensembl BioMart (http://www.ensembl.org/biomart/) is a data management system that can
be used for advanced data querying. Ensembl BioMart can find genes, sequences etc. in the
Ensembl database based on multiple attributes such as chromosome position or gene list. It
provides a table with a clear overview of all the results. In the filters panel, multiple filters can
be added to define restrictions, like specific genes or chromosome. The attributes panel allows
you obtain specific information about the given restrictions, like Ensembl Gene ID or
Ensembl Transcript ID and many other options available. One way to use this tool is to look
for genes linked with a certain SNP or list of SNPs. In this project we used the SNPs (after
GWAS3D analysis) as a filter. We selected the ‘Ensembl Gene ID’, ‘HGNC-symbol’ and
‘SIFT-score’ as attributes. The SIFT-score estimates if a SNP will be deleterious. The score
ranges between 0 and 1, where a low score indicates a possible deleterious SNP.
Ensembl BioMart was also used to investigate the association between genes (filters) and
their phenotype (attributes: phenotype description) were investigated, to see whether genes
could be linked to type 1 diabetes. For both these investigations the ‘Ensembl Variation 80,
Homo sapiens Short Variants SNPs and Indels GRCh38.2’ database was used.
2.2.3 Cytoscape
Cytoscape (http://www.cytoscape.org/) is a freely available software tool that can be used to
analyse, visualize and integrate networks (17). One of the functions of Cytoscape is to create
networks by importing files and/or tables. Nodes and edges will then have to be set, after
which Cytoscape will create the network. An example to illustrate the meaning of nodes and
edges is the following: ‘T1DM’ could be a node to which genes (the edges) associated to
T1DM are linked. Networks can also be merged together with other networks, giving users
the option to improve interpretation options by advanced visualization. The merging of
networks is illustrated in Figure 2.
6
In this projects two networks were made in the beginning. The first network consisted of a list
of SNPs linked to the genes. The second network of the pathways and linked genes from
WikiPathways. These two networks were then merged together to form a network with
pathways, genes and SNP. This will be further explained in the materials & methods and
results section.
2.2.3 PathVisio
PathVisio (http://www.pathvisio.org/) is a program that gives researchers the ability the
create, edit, analyse and visualize biological pathways (18). Pathways have been used in
biology for a long time to describe the biological interactions between molecules like genes,
proteins and metabolites. PathVisio gives the possibility to edit pathways or even to create
new ones that can be directly uploaded to WikiPathways. Within a pathway references and
comments can be added, giving the possibility to show a lot of information in a single
overview.
PathVisio can also be used for the visualization of pathways. The visualization option gives
the users the possibility to show experimental data on the pathway diagrams. The data can be
loaded into PathVisio which then can visualize the results on the data nodes and interactions
in the pathway. Different rules for visualization can be used, for example a color gradient can
be used to visualize an activity measurement, whereas a color rule can be used to show
significance levels. Data nodes will split into multiple columns, so all data is visible at the
same time.
Pathway statistics is a tool to find pathways which are (significantly) altered in a dataset. The
user defines a criterion on which the selection should take place (value for activity
measurement, p-value etc). Based on the selected gene list PathVisio performs an over-
representation analysis calculating a Z-score for each pathway. A positive Z-score indicates
that more genes in that pathway meet the criteria than expected based on the complete dataset.
PathVisio can also be extended with plugins. In this project the new RI-plugin for PathVisio
was tested. This plugin links the genes in the pathway to related elements, for example the
known variants for the selected gene. Those variants are then visualized in a side-panel. By
clicking on a gene, information about the linked SNPs and their SIFT-scores could be
obtained. These SIFT-scores were obtained from Ensembl BioMart. Visualizations could then
be performed based on SIFT-scores and p values, making it easier to find genes (with SNPs)
with a potential deleterious effect.
7
2.2.5 GO-analysis
GOrilla (http://cbl-gorilla.cs.technion.ac.il/) is a tool that can be used to identify enriched GO-
terms in a list of genes (19). There are two input options. In the first option a ‘Single ranked
list of genes’ has to be provided. In the second option ‘two unranked lists of genes (target and
background lists)’ have to be provided. Four different ontologies can be chosen: Process,
function, component and ‘all’. The tool then provides researchers with a table of the most
enriched processes/functions/components and provides a schematic overview.
In this project we used all the genes in the WikiPathways collection as a background list and
used all the genes with T1DM associated SNPs as a ‘single ranked list of genes’, so we could
investigate which processes are overrepresented.
2.3 Workflow
In Figure 3 a detailed workflow of the analysis is shown to provide a clear overview of the
steps taken. SNPs associated with T1DM were extracted from the database (step 1). SNPs
were visualized and reprioritized by GWAS3D (step 2) and consequently linked to their genes
using Ensembl BioMart (step 3). A GO-analysis was performed to search for enriched
biological processes (step 4). Genes together with their T1DM associated SNPs were
visualized in a network using Cytoscape (step 5). This network was merged together with a
network of all the pathways from WikiPathways and the genes involved in these pathways.
The merged (‘Merged pathway-gene-SNP network’) network consisted of pathways linked to
genes, which were consequently linked to the SNPs (step 6). Not all genes could be linked to
a pathway from WikiPathways and not all pathways had genes with T1DM associated genes.
These genes and pathways were removed (step 7) to create a new network, called ‘Filtered
pathway-gene SNP network’ (step 8). This network was then analyzed using Excel (step 9).
The genes not present in any pathways were further investigated using DisGeNET (step 10).
Additionally, an effort was made to visualize and analyse the T1DM related SNPs using
PathVisio (step 11).
8
Figure 3: The 11 analysis steps performed in this project to integrate, visualize and analyse genetic
variation data together with biological pathway knowledge.
3. Results
3.1 GWAS3D-analysis
5.352 different SNPs related to T1DM were extracted from the database provided by Johnson
and O'Donnell. The p-values which indicated the associations of the SNPs to T1DM varied
from 1,12E-307 for rs9273363 to 8,11E-04 for rs12124983. These SNPS were reanalyzed
using the GWAS3D SNP analyzing program. GWAS3D detected 190 variants having
regulatory signals, 190 variants causing transcription factor binding site affinity changes, 159
variants affecting long range interactions, 62 variants having a direct effect by GWAS leading
SNPs and 128 variants having an indirect effect by high linkage disequilibrium (LD) of
GWAS leading SNPs. The Circos-style visualization is shown in Figure 4.
9
Figure 4: Circos-style visualization (GWAS3D) of significant SNPs.
SNP ids or loci are displayed in the outer circle, linked to the corresponding genes or genomic location in the
inner circle. Red lines indicate interaction between the linked genes or loci, where the thickness of the line
represents the intensity of the interaction.
Ensembl BioMart
After removing SNPs with a p-value > 1E-07 and after GWAS3D reprioritization, the
remaining SNPs were linked to the affected genes using Ensembl BioMart. These included a
total of 2.183 different SNPs. Eventually 1.521 SNPs of these 2.183 SNPs were matched to
340 different genes. There were 2496 different links between the 1.521 SNPs and 340 genes.
This is possible since a gene can be linked to multiple genes.
10
Most significant SNP
The SNP which is most significantly associated with T1DM is the rs9273363 (association
with T1DM, p-value: 1,12E-307). This SNP is linked to the HLA-DQB1 gene.
3.2 GO-Analysis
A GO-analysis was performed to search for overrepresented biological processes. All the
genes with T1DM associated SNPs were checked against all genes involved in the pathway
collection from WikiPathways. The most significant processes involved are the antigen
presentation pathway (from exogenous pathogens). Other processes are involved around
leukocytes, lymphocytes, T-cells and B-cells, for instance the ‘positive regulation of T-cell
activation’ process.
Some processes not related to immune function are detected too, like the nucleosome
organization and -assembly. All the processes are shown in Appendix 6.1: Table 1.
11
Figure 5: Overview of all the pathways, genes and SNPs linked in a single network.
Larger network was made by merging together the gene -SNP and pathway - gene network.
12
most SNPs were the HLA-DQA2 gene with 43 SNPs, followed by the HLA-DRA gene with 30
SNPs.
The highest percentages of ‘genes (with SNPs) linked to pathway/total genes in pathway’ was
found in the ‘Arylamine metabolism’, where 1 out of 7 genes (14,3%) have T1DM related
SNPs. The second highest ratio is 7,8%, found in the ‘Allograft Rejection’ pathway. The focal
adhesion pathway is the pathway containing the most genes, a total of 211. An overview of all
the pathways, with the linked genes and SNPs is given in appendix 6.2, Table 2.
Figure 6: Overview of all the pathways, genes and SNPs linked in a single network.
Network of pathways with genes with T1DM associated SNPs. The 3 pathways with the most genes with T1DM
associated SNPs and the HLA-DQB1, linked to the SNP with the lowest p-value are indicated.
13
3.4 PathVisio visualization of SNPs
3.4.1. Visualization of SNPs in PathVisio
The new RI-plugin for PathVisio was used to visualize SNPs in a pathway. Information about
the amount of SNPs (and the SNP ids) becomes available in a side panel when selecting a
gene (Figure 7). The SNPs are coloured based on their p-value (left side of the box). All SNPs
of the selected HLA-C gene have a p-value lower than the cut-off value of 1*10-7. The right
side of the box shows the SIFT-score. These SIFT-scores were obtained from BioMart. Blue
means tolerated, whereas white means that no information is available. No deleterious SNPs
were found for this gene.
14
Allograft Rejection
The Allograft Rejection pathway has a total of 204 genes of which 16 have T1DM associated
SNPs. The pathway shows interactions involved in adaptive immune response which are
responsible for the degradation and destruction of an allograft.
The HLA-genes involved in the Allograft Rejection pathway include HLA-DRB4, -DRB1, -
DQB1, -DQA1, -DPA1, -DRB3, -DPB1, -DOA, -DMA, -DMB, -DOB, -DRA, -DRB5 and -
DQA2. For all those genes except HLA-DRB4, -DRB3, -DOA and -DRB5 SNPs are found that
are linked to T1DM. Besides the HLA-genes also the C4A and MICA genes have SNPs related
to T1DM (Figure 7).
15
Figure 8: The Proteasome degradation pathway from WikiPathways. The red circles are places around the
genes with T1DM related SNPs. Image adapted from WikiPathways.
http://wikipathways.org/index.php/Pathway:WP183
16
Figure 9: The Type II Interferon Signaling pathway from WikiPathways. The genes with T1DM related
SNPs are marked with a red circle. Image adapted from WikiPathways.
http://wikipathways.org/index.php/Pathway:WP619
4. Discussion
4.1 GWAS analysis
A total of 2.183 different T1DM associated SNPs were extracted from Johnson and
O'Donnell’s database. Eventually 1521 SNPs were matched 2496 times to 340 different genes
using Ensembl BioMart, after GWAS3D analysis. This means that 662 SNPs could not be
linked to any gene. This problem can cause important data to be lost. These 662 SNPs were
entered again in Ensembl BioMart and it was found that 145 of the 662 SNPs could be linked
to Ensembl gene identifiers but not to HGNC symbols.
The SNP with the lowest p-value (rs9273363) can be linked to the HLA-DQB1 gene.
Depending on the haplotype of the SNP, the risk for T1DM can increase or decrease. The A/A
haplotype increases the risk, while the people with A/C and C/C haplotypes have a 0,87 and
0,15 time lower risk, respectively (20).
The HLA-genes are already classified as important loci in T1DM as described by Nejentsev et
al. In their association analysis they identified 1.475 SNPs linked to the genes of the MHC-
region (21).
17
4.2 GO-Analysis
The GO-analysis revealed that most of the genes with SNPs linked to T1DM are related to the
immune system and function. Torkamani et al. performed a pathway analysis of the seven
diseases assessed by the WTCCC (22). Most of the processes found for T1DM are also
related to the immune system, like the ‘IL2 activation and signaling pathway’. They also
found two pathway involved in signal transduction (‘Calcium signaling’ and ‘IP3 signaling’)
and several pathway involved angiotensin, AKT and ERK. Differences in results can be
explained by the use of MetaCore by Torkamani et al., while other approaches were used in
this thesis. Although some differences are found, both experiments found immune-related
processes. Since GO-classes are much more generic than pathways, this might also explain
some of the differences. The nucleosome organization and assembly may show up since
chromatin modification is critical for efficient transcription of the MHC-class II proteins (23).
18
no deleterious SNPs. One of the problems encountered was the fact that we did not succeed in
linking more than one transcript per gene. A method to take these transcripts into account is
necessary, since SIFT-scores between these transcripts can vary significantly. This
visualization shows that there are possibilities to visualize SNPs in PathVisio, which facilitate
the interpretation of SNPs in pathways.
19
infection, causing breakdown of self-proteins in the beta-cells. Pieces of the protein fragments
that remain are shown to CD8+ T-cells by the antigen presenting capacities of the MHC-class
I molecules. This T-cells activation will lead to lysis of the beta-cells. By presentation of
these antigens to B-cells, autoantibodies are produced to attack the beta-cells. An extensive
description of this hypothesis is given in ‘The putative role of proteolytic pathways in the
pathogenesis of Type 1 diabetes mellitus: The ‘autophagy’ hypothesis’ (29).
Balasubramanyam M et al. described a different aspect of the proteolysis and its involvement
in T1DM (30). The ubiquitin-proteasome system is one the systems involved in the
degradation of cellular proteins and is involved in numerous biological functions. One of
these biological functions is the insulin signaling (30). Insulin promotes cellular growth not
only by increasing synthesis of some proteins, but mainly by the inhibition of overall
proteolysis (31, 32). Experimental evidence shows that this inhibition of overall proteolysis
comes from the inhibition of the ATP-and ubiquitin-dependent degradation by the
proteasomes in vitro as well as in cultured cells (33). Contrasting evidence shows that
prolonged exposure of cells to insulin causes IRS-1 ubiquitin conjugation and consequently
degradation of IRS-1 (34, 35). In diabetes, inflammatory mediators could upregulate SOCS
proteins (suppressor of cytokine signaling proteins), especially SOCS3 could be upregulated.
IRS-1 and -2 could then be degraded by proteasomes, causing insulin resistance. These SOCS
proteins and the processes involved could play a role in T1DM, although the exact
mechanisms are still unclear (33). In addition to the ubiquitin/proteasome system the
proteolytic autophagic pathway may also play a role in the autoimmune processes. Autophagy
could be involved in the autoimmune process within the MHC class I & II self-antigen
presentation. Components of the ubiquitin/proteasome system may even be shared with the
autophagy process (29). This information suggest that we can confirm the role of the
‘Proteasome Degradation’ pathway in T1DM.
20
4.5 Comparing results with other studies
We compared our results with multiple more recent publication to investigate the role of
several genes associated with T1DM. Hakonarson et al. found in 2007 a significant
association with variations in the gene region that contains the KIAA0350 gene (9). This
regions has not been found in the GWA studies used in this analysis.
The PTPN22 gene is frequently associated with T1DM. This gene encodes the lymphoid
protein tyrosine phosphatase (37, 38). A SNP in the gene could contribute to the development
of T1DM by the negative regulation of T-cell activation, demonstration the important role of
T-cells in T1DM (39). Unfortunately, the PTPN22 is not present in any of the pathways in
the WikiPathways collection, and so it was ‘lost’ in the ‘Filtered pathway-gene-SNP
network’. However, SNPs related to PTPN22 were found in the GWA studies and therefore
should nonetheless be considered as an important factor in T1DM.
Cytotoxic T-lymphocyte-associated protein 4 (CTLA4) is a protein on the surface of activated
T-cells, producing an inhibitory signal regarding the T-cell activation. Changes in the gene
expression in the CTLA4 gene, caused by possible variations in one the regulators may
increase T-cell self-reactivity, indicating its possible link to T1DM (26, 40). No SNP related
to the CTLA4 gene was found in the database, or the SNP could not be matched to the CTLA4
gene using Ensembl BioMart, explaining the absences of the gene in the network .
In the experiment conducted by Carbonetto, as described above in section ‘Allograft
Rejection’ (discussion section), the pathway with the higher Bayes factor was the IL-2
signaling pathway. This pathway is missing in the final network, since none of these genes
had T1DM associated SNPs linked to them. The IL-2 cytokine is involved in the activation,
development and maintenance of T-regulatory cells. Defects in these pathways are involved
in autoimmune disorders (25).
Evangelou M et al. did a pathway analysis in which they started with SNPs and looked for the
pathway involved in T1DM (41). A major difference in their approach compared to this thesis
is that Evangelou et al. removed the MHC-loci in the analysis, so these loci could not bias
towards pathway in which multiple MHC-loci are involved. The most important pathways
they found were also involved in the immune response. Since they used a different approach
and different databases for the pathways (Reactome and BioCarta), the pathway did differ
from the pathways found in this thesis.
21
4.6 Shortcomings in this study
What must be taken into account is the fact that the database from Johnson and O'Donnell
was established in 2009 and the articles from Hakonarson and the WTCCC were both
published in 2007. Therefore SNPs found more recently by other researchers may not be
included. Besides this, only 50% of the protein coding genes are in the pathways since the
detailed mechanisms and functions are often not known. Ensembl BioMart not being able to
link some SNPs to genes may explain incomplete networks.
4.7 Conclusion
We performed a complex interactive network approach to investigate the following research
question: “Can we find the disease related processes by integrating and visualizing the
genetic variants with existing pathway knowledge?”
We found several pathways and biological processes to be involved in T1DM. The MHC-
region was already known to be important in T1DM, and our pathway- and network based
approaches confirmed this finding, showing several relevant pathways containing multiple
HLA-genes. This result shows the importance of immune related pathways in T1DM.
The method applied is able to confirm existing knowledge and gives possibilities to search
new loci and pathways involved in the disease process. One of the difficulties encountered
was that the ‘Merged Pathway-gene-SNP network’ was too big to interpret visually. In the
‘Filtered Pathway-gene-SNP network’ this problem was resolved. In this smaller network the
pathways, gene, SNPs and their connections could be distinguished based on the used
visualization. Future research could focus on improvements in linking SNPs to genes and
genes to pathways, thereby possibly expanding the knowledge on the disease process. This
could eventually lead to improvements in the diagnosis or treatment of T1DM patients.
22
5. References
1. Van Belle TL, Coppieters KT, Von Herrath MG. Type 1 diabetes: etiology,
immunology, and therapeutic strategies. Physiol Rev. 2011;91(1):79-118.
2. Statistiek CB. Steeds meer mensen met diabetes 2014. Available from:
http://www.cbs.nl/nl-NL/menu/themas/gezondheid-
welzijn/publicaties/artikelen/archief/2014/2014-4173-wm.htm.
23
14. Hakonarson H. A genome-wide association study identifies KIAA0350 as a type 1
diabetes gene. Nature 448. August 2007:591-4.
15. Kelder T, van Iersel MP, Hanspers K, Kutmon M, Conklin BR, Evelo C, et al.
WikiPathways: building research communities on biological pathways. Nucleic Acids Res.
2011:1301-7.
16. Pinero J, al. e. DisGeNET: a discovery platform for the dynamical exploration of
human diseases and their genes. Database (Oxford). 2015.
17. Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, et al. Cytoscape: A
Software Environment for Integrated Models of Biomolecular Interaction Networks. Genome
Res. 2003;13(11):2498–504.
18. Kutmon M, Van Iersel MP, Bohler A, Kelder T, Nunes N, Pico AR, et al. PathVisio 3:
An Extendable Pathway Analysis Toolbox. PLoS Comput Biol. 2015;11(2).
19. Eden E, Navon R, Steinfeld I, Lipson D, Yakhini Z. GOrilla: A Tool For Discovery
And Visualization of Enriched GO Terms in Ranked Gene Lists. BMC Bioinformatics.
2009;10(48).
20. Cariaso M, Lennon G. SNPedia: a wiki supporting personal genome annotation,
interpretation and analysis. Nucleic Acids Res. 2012:1308-12.
21. Nejentsev S, et al. Localization of type 1 diabetes susceptibility to the MHC class I
genes HLA-B and HLA-A. Nature. 2007;450(7171):887-92.
22. Torkamani A, Topol EJ, Schork NJ. Pathway analysis of seven common diseases
assed by genome-wide association. Genomics. 2008;92:265-72.
23. Choi NM, et al. Regulation of major histocompatibility complex class II genes. Curr
Opin Immunol. 2011;23(1):81-7.
24. Baglivo L, et al. Genetic and epigenetic mutations affect the DNA binding capability
of human ZFP57 in transient neonatal diabetes type 1. FEBS Lett. 2013;587(10):1474-81.
25. Carbonetto P, Stephens M. Integrated Enrichment Analysis of Variants and Pathways
in Genome-Wide Association Studies Indicates Central Role for IL-2 Signaling Genes in
Type 1 Diabetes, and Cytokine Signaling Genes in Crohn's Disease. PloS Genetics. 3 October
2013.
26. Ueda H, et al. Association of the T-cell regulatory gene CTLA4 with susceptibility to
autoimmune disease. Nature 423. 2003:506-11.
27. Noble JA. Genetics of Type 1 Diabetes. Cold Spring Harb Perspect Med. 2012;2.
28. Noble JA, et al. The role of HLA class II genes in insulin-dependent diabetes mellitus:
molecular analysis of 180 Caucasian, multiplex families. Am J Hum Genet 59. 1996:1134-48.
24
29. Fierabracci A. The putative role of proteolytic pathways in the pathogenesis of Type 1
diabetes mellitus: The ‘autophagy’ hypothesis. Medical Hypotheses. 2014;82(5):553-7.
30. Balasubramanyam M, Sampathkumar R, Mohan V. Is insulin signaling molecules
misguided in diabetes for ubiquitin–proteasome mediated degradation? Molecular and
Cellular Biochemistry. 2005;275:117-25.
31. Fryburg DA, Jahn LA, Hill SA, Oliveras DM, Barrett EJ. Insulin and insulin-like
growth factor-I enhance human skeletal muscle protein anabolism during
hyperaminoacidemia by different mechanisms. Journal of Clinical Investigation.
1995;96:1722-9.
32. Russell-Jones DL, Umpleby M. Protein anabolic action of insulin,
growth hormone and insulin-like growth factor I. Eur J Endocrinol. 1996;135:631-42.
33. Bennett RG, Hamel FG, Duckworth WC. Insulin inhibits the ubiquitin dependent
degrading activity of the 26S proteasome. Endocrinology. 2000;141:2508-17.
34. Sun XJ, Goldberg JL, Qiao LY, Mitchell JJ. Insulin-induced insulin receptor substrate-
1 degradation is mediated by the proteasome degradation pathway. Diabetes. 1999;48:1359-
64.
35. Zhande R, Mitchell JJ, Wu J, Sun XJ. Molecular mechanism of insulin-induced
degredation of insulin receptor substrate 1. Mol Cell Biol. 2002;22:1016-26.
36. Sia C, Weinem M. Genetic susceptibility to type 1 diabetes in the intracellular
pathway of antigen processing - a subject review and cross-study comparison. Rev Diabet
Stud. 2005;2(1):40-52.
37. Hirschhorn JN. Genetic epidemiology of type 1 diabetes. Pedriatr Diabetes.
2003;4:87-100.
38. Maier LM, Wicker LS. Genetic susceptibility to type 1 diabetes. Curr Opin Immunol.
2005;17:601-8.
39. Mehers KL, Gillespie KM. The genetic basis for type 1 diabetes. Br Med Bull.
2008;88(1):115-29.
40. Kristiansen OP, Larsen ZM, Pociot F. CTLA-4 in autoimmune diseases—a general
susceptibility gene to autoimmunity? Genes Immun. 2000;1:170-84.
41. M E. A Method for Gene-Based Pathway Analysis Using Genomewide Association
Study Summary Statistics Reveals Nine New Type 1 Diabetes Associations. Genet
Epidemiol. 2014;38(8):661-70.
25
6. Appendix
6.1 Results of GO-analysis
Table 1: The enriched processes found by GO-Analysis. A description of the process, together with the P-value,
FDR q-value and enrichtment values are given.
26
6.2 Overview of pathways – genes – SNPs in ‘Filtered pathway-gene-SNP
network’
Table 2: Overview of all the pathways which have genes with T1DM associated SNPs linked.
The left column shows the pathway, the middle one the HGNC-symbols of the genes and the right column
indicates which SNPs are linked to the gene. In case there is no SNP indicates, that means that the genes is also
mentioned in another pathway and the SNPs are indicated there.
27
HLA- rs9277380, rs3117224, rs9277477, rs2179920
DPB1
HLA- rs9272346, rs9271775, rs9272775, rs17843593, rs6927022,
DQA1 rs3129768
HLA- rs9276395, rs9276394, rs9276401, rs9276364, rs9276357,
DQA2 rs9276375, rs9276370, rs9276319, rs9276313, rs9276351,
rs9276348, rs7773694, rs7773441, rs9276311, rs7773955,
rs7773068, rs7755597, rs7773407, rs7773149, rs6457644,
rs6457643, rs7755596, rs7744593, rs2051600, rs13214143,
rs5021448, rs2894287, rs13199553, rs12183007, rs13214069,
rs13199787, rs10947336
HLA- rs9273349, rs3134996, rs9273363, rs1063355
DQB1
HLA- rs983561, rs9268657, rs9268645, rs9268659, rs9268658,
DRA rs9268615, rs8084, rs9268641, rs9268634, rs3135394,
rs3135393, rs5000563, rs4612206, rs3129887, rs3129881,
rs3135392, rs3135391, rs3129876, rs3129875, rs3129878,
rs3129877, rs2395181, rs2395180, rs3129872, rs3129868,
rs2395176, rs2395174, rs2395179, rs2395177, rs14004
HLA- rs35464393, rs35366052, rs701831, rs28490179
DRB1
HLA-F
MICA rs3132472, rs3094584, rs9501387, rs2596542, rs2596541,
rs2857282, rs2844521, rs2523459, rs2523453, rs2596540,
rs2523467, rs2428474, rs2256328, rs2523452, rs2428475,
rs2251731, rs1063635, rs2256175, rs2256028, rs1051790
Alzheimers ITPR1 rs3805006
Disease
Amyotrophic DAXX rs3130099, rs3130018, rs3130100, rs2073525, rs2073524,
lateral sclerosis rs3106189, rs2395379, rs1061783
(ALS)
Androgen DAXX
receptor
signaling
pathway
Apoptosis DAXX
Modulation and
Signaling
HSPA1B rs2763979, rs2471980, rs3115674
Apoptosis HSPA1B
Modulation by
HSP70
Apoptosis- ERBB3 rs7971751, rs4759229, rs877636
related network
due to altered
Notch3 in
ovarian cancer
IER3 rs3130660, rs2284174, rs1059612
Arrhythmogenic CACNG1 rs3785579, rs1799938, rs7210865, rs16960487, rs16960501,
28
Right rs16960497
Ventricular
Cardiomyopath
y
TCF7L2 rs7904519, rs7900150, rs7924080, rs7077247, rs7077039,
rs7895340, rs7100927, rs6585201, rs6585200, rs7071302,
rs6585202, rs6585197, rs4074720, rs6585199, rs6585198,
rs12359102, rs12265291, rs4074718, rs12718338, rs11196205,
rs11196200, rs12258200, rs11196208, rs10885405, rs10885402,
rs10885409, rs10885406, rs10787472
Aryl CDK2 rs773107
Hydrocarbon
Receptor
Arylamine SULT1A2 rs11401
metabolism
ATM Signaling CDK2
Pathway
MDC1 rs3132584
Calcium ITPR1
Regulation in
the Cardiac Cell
Cardiac POU5F1 rs887468, rs887465, rs3130457, rs1265159, rs887464,
Progenitor rs6929434, rs1150765
Differentiation
Cell Cycle CDK2
Complement C2 rs638383, rs558702, rs644045, rs497309, rs3130683, rs519417,
and Coagulation rs512559, rs3128761
Cascades
CFB rs512559, rs2072633, rs1270942
Cytokines and HLA-
Inflammatory DRA
Response
HLA-
DRB1
Cytoplasmic RPS26 rs1131017, rs705704
Ribosomal
Proteins
Diurnally HIST1H2 rs201005, rs200995, rs200989, rs200981, rs200991, rs200990,
Regulated BN rs200948, rs17763089, rs200953, rs200949, rs13199772,
Genes with rs13194781, rs17695758, rs13199906
Circadian
Orthologs
HLA-
DMA
DNA Damage CDK2
Response
DNA Damage TCF7L2
Response (only
ATM
29
dependent)
DNA CDK2
Replication
ErbB Signaling ERBB3
Pathway
NRG3 rs11818231, rs11816685, rs17101073, rs17095600, rs11815363
Eukaryotic GTF2H4 rs886420, rs1264308, rs1264304, rs1264312, rs1264310
Transcription
Initiation
FAS pathway DAXX
and Stress
induction of
HSP regulation
Fatty Acid Beta ACSL1 rs13112568, rs13126272, rs13120078
Oxidation
Fatty Acid ACSL1
Biosynthesis
Focal Adhesion TNXB rs1150752, rs393544, rs3134954, rs433061, rs3130285,
rs3117189, rs3130342, rs3130287, rs3096695, rs204895,
rs3117182, rs3117181, rs204885, rs204879, rs204890,
rs204889, rs1269852, rs1150753, rs204878, rs1269854
G Protein ITPR1
Signaling
Pathways
G1 to S cell ATF6B rs393544, rs3134954, rs3117182, rs3117181, rs3130342,
cycle control rs3130288, rs204894, rs204892, rs3096695, rs204895,
rs1269854, rs1269852, rs204890, rs204889, rs1269851,
rs1150752
CDK2
Ganglio B3GALT4 rs464865, rs463260, rs469064, rs446735, rs462093, rs455567
Sphingolipid
Metabolism
Gastric Cancer HIST1H4J rs200502
Network 1
Heart ERBB3
Development
Histone EHMT2 rs558702
Modifications
HIST1H3J rs200977
HIST1H4J
ID signaling CDK2
pathway
IL-5 Signaling SPRED1
Pathway
Insulin FLOT1 rs8233, rs3095329, rs3094127, rs3130660, rs3095330,
Signaling rs1059612, rs2284174, rs1064627
Integrated CDK2
Breast Cancer
Pathway
30
Integrated CDK2
Cancer pathway
Integrated CDK2
Pancreatic
Cancer Pathway
DAXX
Iron uptake and ATP6V1G rs2857607
transport 2
SLC17A3 rs555460, rs548987, rs726836, rs629444, rs1324088,
rs1324087, rs523383, rs501220, rs1165168, rs1184498,
rs1177441
MAPK DAXX
Signaling
Pathway
HSPA1B
Mesodermal POU5F1
Commitment
Pathway
Metapathway SULT1A2
biotransformati
on
miR-targeted ATAT1 rs9262135, rs9262130
genes in
lymphocytes -
TarBase
EHMT2
miR-targeted EHMT2
genes in muscle
cell - TarBase
ERBB3
miRNA CDK2
Regulation of
DNA Damage
Response
miRNA targets TNXB
in ECM and
membrane
receptors
Mitochondrial ACSL1
LC-Fatty Acid
Beta-Oxidation
mRNA DHX16 rs9262135, rs9262141
Processing
Myometrial ATF6B
Relaxation and
Contraction
Pathways
ITPR1
Neural Crest NOTCH4 rs3132946, rs3132940, rs3134942, rs3132956, rs3096690,
31
Differentiation rs2071278, rs3131296, rs3131294, rs204989, rs204987,
rs204991, rs204990, rs1044506
NOTCH4
NOTCH4
NRF2 pathway AGER
Oncostatin M CDK2
Signaling
Pathway
Ovarian MSH5 rs707915, rs3132445, rs707938, rs3131378, rs3130484,
Infertility Genes rs3131383, rs3131379, rs3117574, rs3115672, rs3117577,
rs3117575, rs1150793, rs1144708, rs3115671, rs3101018
p38 MAPK DAXX
Signaling
Pathway
Parkin- HSPA1B
Ubiquitin
Proteasomal
System pathway
TUBB rs3095330, rs3095329, rs8233, rs3132584, rs3094127
Pathogenic TUBB
Escherichia coli
infection
Proteasome HLA-A
Degradation
HLA-B
HLA-C
HLA-F rs929158, rs885942, rs929160, rs1633106, rs1632957,
rs1736915, rs1736913, rs1610608, rs1610602, rs1628578,
rs1611350, rs1610601, rs1419696
HLA-J rs356969, rs356968
PSMB8 rs4148882, rs241427
PSMB9 rs4148882
RB in Cancer CDK2
Regulation of FGF14 rs12708382
Actin
Cytoskeleton
Regulation of SPRED1
Microtubule
Cytoskeleton
Serotonin ITPR1
Receptor 2 and
ELK-
SRF/GATA4
signaling
SIDS C4A
Susceptibility
Pathways
POU5F1
32
Signaling CDK2
Pathways in
Glioblastoma
ERBB3
Spinal Cord AIF1 rs2857600, rs2736177, rs3763295, rs2736176, rs2269475
Injury
CDK2
Sulfation SULT1A2
Biotransformati
on Reaction
TCR Signaling ITPR1
Pathway
Triacylglyceride AGPAT1 rs3134947, rs3134943, rs3132965, rs3134946, rs3134945,
Synthesis rs3130347, rs3130284, rs3131297, rs3130348, rs3096689,
rs2849013, rs3130283, rs3096697, rs1269839
TSH signaling CDK2
pathway
Type II HIST1H4J
interferon
signaling
(IFNG)
HLA-B
PSMB9
TAP1 rs4148882
Wnt Signaling POU5F1
Pathway and
Pluripotency
TCF7L2
Wnt Signaling TCF7L2
Pathway
Netpath
33
Add-on page: Description of work by student Bachelor BMW
This add-on page to the Bachelor thesis BMW provides details on the role of the student
in the experiments, data collection, and analyses described in the thesis.
Provenance of data
Please describe here, for all data(sets) described in the results, whether it was generated
by the student, generated by others during the project (e.g. project members, staff,
other students), previously generated or existing in the lab, or taken from public
resources:
The SNPs were extracted from a free accessible database online. A reference to the
database is given in the thesis.
The WikiPathways file (pathways + their genes) was provided by the thesis supervisor.
The DisGeNET script was obtained by contacting the developers.
[Extend the box if needed]
SNPs were linked to their genes using Ensembl BioMart. These genes were then used
together with the WikiPathways file in order to create a network (pathway-gene-SNP).
Genes not linked to a pathway were removed in Cytoscape. The resulting network was
interpreted by the student with the aid of Excel (sorting Pathways, sorting genes etc).
Pathway to use in PathVisio was downloaded from WikiPathways.
The DisGeNET and GO-Analysis results were visually interpreted by the student.
[Extend the box if needed]
34
Please describe here which other analytical results were integrated with those of the
analysis performed by the student, if any:
[Extend the box if needed]
Remarks
Visualization of SNPS in PathVisio was performed together with the supervisor and co-
supervisor of this project, Martina Summer -Kutmon and Elisa Cirillo resp. This technique
is new and was not meant to create data for this thesis.
[Extend the box if needed]
35