Master Project (LongYu)

University of Sussex
Dissertation
Building, Visualising, and Analysing
Phenotypic Canine Disease
Networks: A Gaussian Graphical
Model View
Author:
Long Yu 121880
August 27, 2014
1 Abstract
The use of networks to analyse diseases has been proved to be a powerful tool. Here
we build phenotypic canine disease networks based on manually veried database
provided by Royal Veterinary College. The main relation of diseases we want to
study is comorbidity. As networks are expressive and intuitive way to represent
the objects relationship, we build the network for comorbidities in our paper
and introduce a technique named Gaussian graphical model(GGM) to do that.
GGM presents the correlation of the diseases and by applying a proper penalty
parameter, the network would maintain the most correlated disease pairs and get
rid of less correlated ones. To validate the GGM network, we introduce another
two kinds of networks based on measuring disease correlation named Related Risk
and -correlation which is Pearsons correlation for binary variables. Also, we
consider a expert validation with Dr Dan ONeill from Royal Veterinary College
and nd that GGM network has the best performance among them in terms of
precision. Moreover, we nd that Middle Level code of Skin (cutaneous) disorder
nding take an very important role in dog diseases as it has the highest prevalence
and is more likely to be comorbidity.
Keywords: Gaussian Graphical Model, Network Science, Visualization
1
2 Acknowledgements
I would like to thank my supervisor Dr Novi Quadrianto
1
. He provides me clear a
guide line, advice of network modelling and visualization information throughout
the period as his supervised student. Every time we talk about the project, he
brings new ideas and plenty of related materials. Due to his eorts, I have the
chance to cooperate with Royal Veterinary College (RVC) and acquire dogs disease
dataset from them to build networks.
I would also like to thank Dr Dan ONeill
2
from RVC who provide me well-
structured canine disease data, relevant hierarchy of diseases and expert validation
results on our networks. Every meeting, he provided us of value suggestions from
his perspective.
Lastly, I want to thank Noel Kennedy from RVC for his much larger canine
disease dataset and new disease structure called Data Dictionary.
1
http://www.sussex.ac.uk/profiles/335583
2
http://www.rvc.ac.uk/staff/doneill.cfm
2
Contents
1 Abstract 1
2 Acknowledgements 2
3 Introduction 4
4 Building Canines Disease Networks 5
4.1 Source Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
4.2 Gaussian Graphical Model . . . . . . . . . . . . . . . . . . . . . . . 6
4.3 The Relative Risk and -correlation measurement . . . . . . . . . . 9
4.4 Thresholds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
5 Visualising Canines Disease Networks 14
5.1 Gephi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
5.2 Generate data for GGM network and visualization . . . . . . . . . . 14
5.3 Generating data for RR and -correlation network and their visu-
alization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
6 Analysing networks 19
6.1 Networks validation . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
6.2 GGM network analysis . . . . . . . . . . . . . . . . . . . . . . . . . 21
6.3 Expert validation of GGM network . . . . . . . . . . . . . . . . . . 25
6.4 Analysing illness progression on dierent gender . . . . . . . . . . . 26
7 GGM on Large Canine Disease data 29
7.1 Data Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
7.2 Data Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
8 Future work 32
References 33
Appendices 35
3
3 Introduction
In medicine, many diseases or disorders have no clear boundaries because one
disease may have multiple causes and can be associated with other diseases. One
disease tends to have multiple concurrent diseases called comorbidities. Comorbid-
ity is the presence of one or more additional disorders co-occurring with a primary
disorder. Normally, we consider the comorbidity relationship on the two diseases
if they aect the same individual more than by chance.
A network oers a platform to explore from a graph-theoretic framework in
representing the associations of disorders. During the past decade, a number of
resources have been proven the ability of the network in building and analysing
diseases. Goh et al.(2007) built a human disease network exploring that human
genetic disorders and the corresponding disease genes may be related to each other.
Lee et al.(2008) constructed a human disease network which two diseases are linked
if mutated enzymes associated with them catalyze adjacent metabolic reactions.
Network studies of the mapping of protein protein interactions or interactome
mapping was implemented by Rual et al.(2005), such maps have revealed dynamic
features of interactome networks that relate to known biological properties. From
a proteomic perspective, the reason of the comorbidities is that disease associated
proteins act on the same pathway and Hidalgo et al.(2009) built a disorder network
of human phenotypes based over 30 million medical records.
Studying the comorbidity can help to understand biological and medical ques-
tions. For instance, Schneeweiss et al.(2003) dene and improve the performance
of existing comorbidity scores in predicting mortality in Medicare enrollees. In this
paper, we will build a Graphical Gaussian Models network to represent the comor-
bidity. Graphical Gaussian Model(GGM), also known as Gaussian concentration
graphs or covariance selection models, has become a popular tool recently, it is
a eective way to measure the correlation between diseases. It computes all pair-
wise correlations and subsequently draw a corresponding graph based on specic
threshold which is the main free parameter of GGM. In this paper, we build the
phenotypic canine disease network based on GGM so that we can get the straight-
forward way to interpret and analyse dog comorbidities. Another way to build
the comorbidity network is to measure the disease associations, which can be done
by Relative Risk and correlation measurements. Both of them can be used to
4
measure correlation of diseases and Hidalgo et al.(2009) built phenotypic disease
network according to these two measurements.
To validate the performance of networks, we introduce expert validation with
Dr Dan ONeill from RVC who is a companion animal epidemiologst worked general
practice for 20 years and running his own companion animal practice for 12 years.
We validate 10 disorders with highest prevalence which shows GGM has the best
performance comparing with Relative Risk and correlation networks. After that,
we focus on how illness spread on dierent genders. Through Odds Ratio(OR) with
specic threshold, we calculate each disease pair on how likely it happens in one
gender than the other and hence build the OR network.
4 Building Canines Disease Networks
4.1 Source Data
All the source data is obtained from RVC. The main le is canine disease dataset,
it has columns of name, gender, clinic id, date of death, 429 dierent diseases
etc. as attributes. Each disease refers to VeNom code
3
, which is used in referral
veterinary hospital electronic patients records and rst opinion veterinary practice
management systems. The row information is 3884 dog records which were ran-
domly selected from the overall disease dataset. The whole data were manually
annotated and collected from authorised pet clinics all over the England from 1st
Sep 2009 to Middle of 2013. During this period, any patient could come and leave
any time. The average period of treatment was 365.3 days while maximum period
was 1275.1 days and minimum period was 1 day only. Also, 378 dogs died while
3506 were not. 2051 of them were male, 1817 were female and 16 terms were not
recorded. Average weight of dogs was 19.8kg. In addition, 939 dogs didnt have
any disorder which means nearly a quarter of data was written o in building
the comorbidity network. As the dataset is not large(3884) and a number of dis-
eases have quite a low prevalence, the conclusions we draw cannot be completely
convincing from probability perspective.
The second data le is the mapping from each disease to its body location.
There are 8 dierent kinds of body part named Abdomen, Anus/Perineum, Head
3
http://www.venomcoding.org/VeNom/Welcome.html
5
and neck, Limb, Pelvis, Tail, Thorax, Vertebral. For the visualization purpose, we
will use it to draw a dog-like network on GGM network and locate each disease to
its own body location. Not all disease can be related to the certain body location,
a few diseases dont belong to any body location and we will locate them outside of
dog body in the network. By doing this, it is intuitive to know where the disorder
happens. The third data le is the mapping of disease with its Middle Level code
which can be used to categorize diseases. For example, Abdominal nding is
the Middle Level of Ascites. Totally, there are 70 dierent sorts of Middle Level
terms. Each term has several disorders and we use the same color indicating same
Middle Level disorders in all networks.
4.2 Gaussian Graphical Model
Graph is a representation of a set of nodes or vertices where some pairs of nodes are
connected by links or edges. Node represents canine disorder and two nodes can
form an edge which represents the comorbidity between them. The edge typically
has two types - directed or undirected, as the main goal is to measure comorbidities
between dog diseases, we choose the undirected one. Normally, graph is expressed
as G=(V, E), where V are the vertices and E are the edges. To build the network,
main questions are which nodes should be selected and which nodes pair should be
connected as links. To answer them, we apply Gaussian graphical model(GGM)
technique. It helps us to nd the graph structure which is a sparse graph of the
disease nodes that represents the conditional independence properties present in
the data.
The training data consists of 3884 instances(rows) by 429 diseases(columns)
matrix. From disease/column perspective, each disease is a vector with length
3884 and each element has value 1 or 0 (aect or not aect). GGM will com-
pute all pairwise correlations between two disease vectors and subsequently to
draw the corresponding graph. As the name indicates, GGM assume each vari-
able follows Gaussian distribution or Normal distribution and all the variable
constitute multivariate Gaussian distribution. Specically, 429 random vectors
X = (X
1
, X
2
, . . . , X
p
) consist of a multivariate Gaussian distribution N
p
(, ),
where p is 429, both mean and covariance matrix are unknown. Probability
density function of multivariate Gaussian distribution:
6
f(x
1
, . . . , x
p
) =
1
(2)
p
||
exp(
1
2
(X )
T
1
(X ))
We hope to estimate the inverse covariance matrix C (C=
1
) because it is the
key matrix to decide the structure of the graph. In inverse covariance matrix(ICM)
(C
ij
C), a zero element C
ij
= 0 indicates a conditional independence between
the two random variable x
i
and x
j
given all the other variables or diseases. In other
words, the correlation between disease x
i
and x
j
is absent if and only if x
i
and
x
j
are conditionally independent. It is equivalent to the problem that estimates
the parameter and identify zeros in the ICM. This kind of problem is also called
covariance selection problem(Dempster, 1972).
To address the problem, the standard method is the greedy stepwise forward
selection or backward deletion. The weakness of it is that the common stepwise
procedure has large computational complexity. Each single step, it needs a mass
of candidate models[1]. Meinshausen and bhlmann(2004) proposed a lower com-
putational complexity approach to do the covariance selection by neighbourhood
selection for each vertex in the graph. All the methods introduced above, model
selection and parameter estimation are done separately. In this paper, we choose
a penalized likelihood method that does model selection and parameter estima-
tion together in the Gaussian graphical model(Yuan and Lin, 2007). An important
problem we want to address is to make network sparser which means we want more
0 elements to appear in the proper positions in ICM, to do this, GGM applies an
L1 penalty term which is the same inspiration from the Lasso penalty of linear
regression. Then, the weak correlation of diseases will be ignored by applying
penalty term.
To build the network, all the disease vectors will be centered which means the
sample mean of data is zero. All sample X
1
, X
2
..X
n
are Independent and identi-
cally distributed. As all disease vectors follow multivariate Gaussian distribution,
the log-likelood of and C=
1
is
n
2
lndet C
1
2
n
i=1
(X
i
)
T
C(X
i
) (1)
The MLE of (, )is (

X,

A), where
7
A =
1
n
n
i=1
(X
i

X)
T
(X
i

X) (2)
Thus, the inverse covariance matrix C can be estimated by

A
1
. In general, we
choose sample covariance matrix S = n

A/(n1) and C can be estimated by S
1
.
However, the number parameter estimation is quite large, where parameters are
the upper triangular or lower triangular elements of the ICM(total number:
p(p+1)
2
).
With so many parameters, S is not stable of estimating . So, we introduce lasso
penalty term as discussed before in order to make the graph sparse. The problem
now is to nd the minimizer (, C) and C is positive denite matrix[1]:
lndet C +
1
n
n
i=1
(X
i
)
T
C(X
i
) s.t.
i=j
|c
ij
| t (3)
Here t 0 is turning parameter controlling the sparsity. As disease data has
been centered and log det C has the same result as lndet C in nding the
minimizer of C, the problem transfers to minimize formula which follows the same
form as(e.g Banerjee et al., 2008; Friedman et al., 2008):
log det C + tr(SC) + C
1
(4)
where tr is trace, is the tuning parameter and C
1
is the L
1
norm on C.
To solve the formula (4), Yuan and Lin(2007) came up a method by regard-
ing the problem as the determinant maximization problem(maxdet problem) and
solve them using the interior point algorithm which is quite time-consuming. In
our paper, we choose algorithm proposed by Friedman et al.(2008) named Graph-
ical Lasso. They use the block coordinate descent approach that has been used
in Banerjee et al.(2007) as a starting point, then propose a new algorithm that
extremely simple and faster comparing with other methods.
We introduce the KKT conditions to solve (4). As the problem is unconstrained
optimization, we use stationarity condition only which says zero vector is one of
the elements of sub-dierential set. The derivative of log det C = C
1
, proved
in Boyd & Vandenberhe (2004), page 641. Then, we write Graphical lasso KKT
stationarity condition as[2]:
8
W + S + = 0 (5)
where |C
ij
| and W = C
1
. Now, we will solve in terms of W. Note that
W
ii
= S
ii
+ for C
ii
> 0. Partitioning W and S as:
W =
W
11
w
12
w
T
12
w
22
S =
S
11
s
12
s
T
12
s
22
Where W
11

(p1)(p1)
, w
12

(p1)1
, w
21

1(p1)
, w
22

Consider 12-block of KKT conditions[3]:
w
12
+ s
12
+
12
= 0 (6)
From
W
11
w
12
w
T
12
w
22
C
11
c
12
c
T
12
c
22
I 0
0 1
(7), we can get that w

12
=
W
11
c
12
/c
22
, subsitituting it to (6), we get:
W
11
c
12
c
22
+ s
12
+
12
= 0 (8)
Assuming x = c
12
/c
22
and rewrite it as:
W
11
x + s
12
+ = 0 (0)
where ||x||
1
. This formula looks like the KKT conditions for:
min
x
x
T
W
11
x + s
T
12
x + ||x||
1
(10)
This is a lasso problem which can be solved quickly by coordinate descent
algorithm[3]. As we have got w
12
= W
11
x, and c
12
, c
22
can be acquired by (7).
We set w
21
= w
T
12
, c
21
= c12
T
, thus, we reduce the graphical lasso problem to a set
of sequential lasso problems that can be easily solved by many methods.
4.3 The Relative Risk and -correlation measurement
We quantify the strength of the comorbidities through the correlation between two
diseases. The measurements we choose are Relative Risk(RR), -correlation. Both
9
of them can quantify the disease associations. The RR of a pair of diseases i and
j infecting on the same dog is given by:
RR
ij
=
C
ij
N
P
i
P
j
where C
ij
is the number of dogs aected by both diseases, N is the total number
of dogs. P
i
and P
j
are the prevalence of the disease i and j or how many dogs aect
that disease. RR
ij
> 1 means probability of ith disease and jth disease association
is larger than expected by chance, while RR
ij
< 1 means they are smaller than
expected by chance.
The -correlation, which is Pearsons correlation for binary variables, of two
diseases i and j over same dog is dened by:
ij
=
C
ij
N P
i
P
j
P
i
P
j
(N P
i
)(N P
j
)
ij
> 0 means comorbidity is more likely than expected by chance, while
ij
< 0 means comorbidity is less likely than expected by chance. These two mea-
surements are simple and eective in calculating similarity of two diseases. When
the value goes higher, it indicates stronger correlation between two diseases, vice
versa. A main disadvantage of two approaches is that they have intrinsic biases.
As for the RR, it will overestimates associations involving rare diseases and under-
estimates associations between highly prevalent disorders[4]. Take overestimation
for example, if P
1
= 10
2
, P
2
= 10
2
are rare diseases and total number of dogs are
N = 10
7
, then RR =
C
12
10
7
10
2
10
2
= C
12
10
3
. Even if C
12
is a small number, the RR
will be quite a high value that is apparently overestimated. correlation underes-
timates diseases with extremely dierent prevalence. For instance, assuming the
two diseases are maximize correlated which means the overlap can be quite large
: C
12
= P
2
. Then, replace C
12
with P
2
, we get:
=
P
2
(N P
1
)
P
1
P
2
(N P
1
)(N P
2
)
=
P
2
(N P
1
)
P
1
(N P
2
)
When the prevalence likes this P
2
P
1
N, then the approximation of =
P
2
P
1
.
It is quite a small number which is underestimated. These two measurements are
not totally independent of each other as both of them increase with the number
of dogs aected by both diseases.
10
4.4 Thresholds
From a visualization perspective, not every comorbidity should be appreciated.
There is a tradeo between the number of disease associations and the signi-
cance of them. For RR and correlation networks, if we specify a high cuto,
we would lose information from original data and preserve fewer most correlated
comorbidities. The resulting networks will be very sparse and most diseases will
be completely disconnected. As for a low cuto, the visualization of the networks
would become extremely dense, even the accidental event from data will be pre-
sented in the network, it is hard for us to analyse main trend of disease. By trading
o the value of threshold, we hope to nd a sparse solution that still adequately
explains the data and what we want to achieve is to preserve a large number of
nodes but relatively few links in this experiment. Most nodes preserved ensures
we wont lose much disease information and few links make us focusing on signif-
icant comorbidities only. Thus, we draw a picture on the nodes number with its
threshold of correlation:
Figure 1: thresholddiseases number
From the gure above, it can be seen that when > 0.09, the number of nodes
decreases dramatically. By applying cuto = 0.09, it preserves most of diseases, the
comorbidity number decreases from 5989 to 934, thus, the network becomes much
11
more sparser and signicant associations are reserved. What we are interested in
now is what statistical signicance level it is. To validate it, we apply the t-test
for all the associations and the null hypothesis becomes = 0. t value can be
calculated by this formula
4
:
t =

n 2
1
2
Where n is the number of observations. In all of our data we use n=max(P
i
, P
j
),
which represents the most stringent way in which t can be calculated given our
data. To determine the signicant level of t, it is necessary to view t value table,
for n>1000, any t 1.96 is signicant at the 5% level and any t 2.58 is at
1% level. In our experiemnt, the signicant level is 5% and it can be calculated
by stats package of python as stats.t.ppf(1-0.025, n). After calculating all the
diseases pairs, most links will reject the null hypothesis by t test which means
threshold = 0.09 ensure most links signicance level at 5%.
Figure 2: RR thresholddiseases number
As for RR, we also plot gure of nodes number with threshold above. It can
be seen that there is several thresholds can be selected. According to the result of
hypothesis test and keeping node number nearly the same as correlation network,
4
http://barabasilab.neu.edu/projects/hudine/resource/data/data.html
12
we nd that RR = 34 is a good threshold choice. The number of links fall to 786
and nodes to 368. This time, to conrm the signicant level, we calculate the 95%
condence interval given by:
[RR
ij
exp(1.96
ij
), RR
ij
exp(1.96
ij
)]
where
ij
is:
ij
=
1
C
ij
+
1
P
i
P
j
1
N

1
N
2
The null hypothesis is RR = 1 and to reject it, we nd that more than half
links with 95% condence interval dont include 1, which means threshold = 34
ensure majority of links hold the signicance level at 5%.
In gaussian graphical model, the main free parameter is the penalty term.
Former two measurements choose the threshold by hypothesis test, as for GGM,
in order to validate with RR and correlation networks, we select the penalty
parameter so as to maintain nearly the same nodes and links. Also, according to
gure below, we can see that when threshold is larger than 0.09, the slope would
decrease rapidly. Thus, we choose parameter of penalty term = 0.09. The link
and node number is 869 and 379 respectively.
Figure 3: GGM thresholddiseases number
13
5 Visualising Canines Disease Networks
5.1 Gephi
The tool to build networks is Gephi
5
. It is an interactive visualization and free
software for all kinds of networks and complex systems, dynamic and hierarchical
graphs. It can be run on multiple system platforms such as Mac Osx, Windows
and Linux. Two primary source les for Gephi to generate networks are the nodes
and links les, both of them should be edited in CSV format. The main advantage
of Gephi is its various layouts and easy to manipulate nodes and links along with
color, size and location settings. There are two important plugins that are helpful
for us to generate the network. First one is GeoLayout plugin, after installing
it, we can set the nodes at any xed location by longitude and latitude attributes
which is the same as x y coordinate. In order to have a good-looking and clear
network, we plot the diseases as well as the location to the certain body part. Also,
we locate many isolated nodes so as to draw the outline of a puppy by the longitude
and latitude attributes. Secondly, by installing the SigmaExporter plugin, Gephi
export the network components into a folder which contains HTML, source les
and conguration les. Then, network can be viewed in the browser and deployed
online. The reason why the network can be embeded in browser is that it uses a
technique called Sigma.js. Sigma.js
6
is a JavaScript library dedicated to graph
drawing. It makes easy to publish networks on webpages and integrate network
with rich web applications. What is more, this link
7
is a short video in how to
build GGM network through Gephi on our dog disease data.
5.2 Generate data for GGM network and visualization
First of all, we would like to draw the outline of puppy by locating many isolated
nodes. By doing it, users can view the relation between disease and its body loca-
tion directly. The original dog image was downloaded from deviantART website
and image is a black/white png format image with 528x564 pixels. Now, the task
is how to transfer the puppy image to many nodes with x y location that draw
5
http://gephi.github.io/
6
http://sigmajs.org/
7
https://www.youtube.com/watch?v=syzgKGYYIdU&list=UUcGDb7rt_B4h1EHqRfqPL8w
14
the outline of a puppy. We use the Matlab Image Processing Toolbox to process
image. To acquire the x and y coordinates, we regard the 528x564 pixels image as
a matrix and extract location information directly from the row index and column
index of image matrix. Another problem is the image matrix contains overmany
elements(528 564 = 297792), which is cumbersome for visualization. So, to get
a sparser layout, we apply mod function to select fewer nodes. Importing image
as bitmap and matlab codes is shown below:
% import image
img = imread(puppy.png);
% select proper dimension
img = img(:,:,1) ;
% initialize image
new_img = ones(528,564)255;
num=0;
for i = 1:size(img,1)
for j = 1:size(img,2)
if img(i, j ) == 0
if mod(i,3)==0 && mod(j,3)==0
new_img(i,j)=0;
num=num+1;
end
end
end
end
imshow(new_img)
% write the image matrix to csv
dlmwrite(puppy.csv,new_img)
To visualize the network, we will use the graphical lasso algorithm as discussed
above. For the code part, it has been already implemented through R library so
that we can use it directly
8
. The main code is shown below:
# code in code/R/glasso.r
# import glasso library
library(glasso)
all_data < read.csv(source le ,sep = ";")
# calculate covariance matrix
disease_data < subset(all_data)
variance < var(disease_data)
cor <cov2cor(variance)
# apply glasso algorithm
a<glasso(cor, rho=threshold)
write. table(a\$wi, le =output le .csv ,sep = ",")
The output le is 429 by 429 Inverse Covariance Matrix. Next, we detect
the non-zero elements of the ICM as each one indicates a link between its row
element and column element. For example, if 3th row and 4th column element is
8
http://statweb.stanford.edu/~tibs/glasso/
15
not zero, then 3th disease and 4th disease form the comorbidity. After iterating
every element of ICM, we collect all the links and nodes information which can
be imported to Gephi. As for format of node le, the additional attribute is
MiddleLevel which can be obtained from the Middle Level mapping le. By
SigmaExporter, the GGM network is shown below:(and can also be seen online:
9
)
Figure 4: GGM network
It is obvious to see disease along with its comorbidities and Middle Level code.
There are 9 node clusters according to body locations. As for network interaction,
when you hover over the disease node, it automatically highlights its comorbidities.
When clicking the node, full description about the disease will be shown on the
right-hand side. Take Interdigital cyst (dogs) as example, when you click that
disease node, you will see the disease information along with its comorbidities(see
Figure 5 below). Also, you can zoom in/zoom out and refresh the GGM network
through the three buttons below or scroll up/down mouse wheel. To search disease,
you can type the name in input eld of left-hand toolbar. The size of node is
proportional to the prevalence of the disease while nodes in same color indicate
same Middle Level code.
9
http://smileclinic.alwaysdata.net/long_msc2014/ggm_dog_network/
16
Figure 5: Interaction example of Interdigital cyst(dogs)
5.3 Generating data for RR and -correlation network and
their visualization
The way to generate data for RR and -correlation network is slightly dierent
from dealing with GGM network(Main code in code/network.py). The rst task
is how to process the dog disease le. As we know, the disease le looks like a
instances/diseases matrix and what we are interested in is disease associations. So,
the goal is to transfer the matrix to diseases pairs[(disease1, disease2),(disease2,
disease3),(disease3, disease5),. . . ]. To extract the association, we regard each dog
instance as a vector. For example, a dog instace vector is [1,0,1,1,0,0,. . . ], it
contains 429 elements and each element stands for certain disease. 1 indicates
disease detected and 0 indicates no disease detected. We iterate evey dog instance
to get the permutation of diseases so that we can acquire all the possible disease
pairs. Next, disease pairs will merge and count in order to calculate the RR or
score. New disease pairs format looks like this:[((diesase1_id,diesase2_id),RR/
score),((diesase2_id,diesase3_id),RR/ score). . . ]. By applying the threshold of
RR and (34 and 0.09 respectively), the original matrix will be tranfered to nal
comorbidities. The edge le add a new attribute called weight which is the RR/
score of each disease-pair. In networks below, the thickness of links indicate weight
17
of the comorbidity. Here are correlation and RR networks(also viwe online RR
network
10
and correlation network
11
).
Figure 6: correlation network
10
http://smileclinic.alwaysdata.net/long_msc2014/rr_dog_network/
11
http://smileclinic.alwaysdata.net/long_msc2014/phi_dog_network/
18
Figure 7: RR network
6 Analysing networks
6.1 Networks validation
In the canine disease dataset, the top 3 prevalent diseases with their prevalence
are: Otitis externa - 396, Periodontal disease - 361, Anal sac impaction - 277.
They are extremely common diseases and almost 1/10 dogs have Otitis externa
and Periodontal disease illness. Figure and table below draw diseases prevalence
ditribution. 121 diseases appear only once which means nearly a quarter of diseases
are rare diseases in our dataset.
19
Figure 8: disease prevalence distribution
Prevalence Count
1 121
3 33
4 20
5 24
6 17
7 11
12 9
15 9
9 8
13 8
8 7
10 7
22 6
17 5
. . . . . .
To validate GGM network, we use the RR and correlation networks as com-
parisions and use Jaccard index to measure the similarity. Jaccard index is a
statistics used for comparing the correlation or similarity of two nite sample sets.
It calculates the intersection of two sets divided by the union of two sets:
Jaccard(A, B) =
|A B|
|A B|
We extract all the comorbidities from three networks and use python code below
to calculate. The Jaccard index of GGM and network is 0.918085106383, while
Jaccard index of GGM and RR network is 0.789189189189. The GGM network has
quite a high score/similarity with network, which means most of comorbidities
among them are overlap. In addition, the result indicates the GGM network is a
reasonable network validated by network. As for the RR network, the score is
lower than network. It is because the bias of Related Risk. RR overestimates
associations involving rare diseases and nearly a quarter of diseases appear once
only. As a result, the disease pairs are much more likely to be biased. Also, some
other dierence among three networks should be taken into consideration. The
GGM network assume that disease distribution is Normal distribution while the
20
other two have their own biases. The threshold we select is not equivalent for every
network where node and link number are not exactly same in dierent networks.
#code le: code/analyse_disease_pairs.py
def jaccard_index(set_1, set_2):
intersection_num = len(set_1.intersection(set_2))
return intersection_num / oat(len(set_1) + len(set_2) intersection_num)
6.2 GGM network analysis
First, we do the analysis on the Middle Level code. Here lists top 5 prevalent
Middle Level codes of GGM network:
Middle Level code Prevalence
Skin (cutaneous) disorder nding 10.82%
Neoplasia 9.23%
Mass lesion nding 6.86%
Ophthalmological disorder nding 5.8%
Enteropathy 5.28%
Table below is top 8 heavily connected nodes or diseases along with comorbidity
number. Normally, we call these nodes as hubs of network. Inside them, Middle
Level code of Skin (cutaneous) disorder nding has most hubs which include
Skin (cutaneous) disorder, pigmentary, Eosinophilic granuloma and Pododer-
matitis.
Disease name Number of comorbidities
Cognitive dysfunction 19
Skin (cutaneous) disorder, pigmentary 15
Eosinophilic granuloma 15
Cardiomegaly 13
Colitis 13
Spondylosis 13
Pododermatitis 13
DJD 13
To see how the Middle Level codes associate with each other, we also map each
comorbidity to its Middle Level code, after combination, table below shows top
21
3 Middle Level associations with number of occurrence. It can be see that Skin
(cutaneous) disorder nding is again the most popular one. To sum up, dogs
are likely to aect disorder of Skin (cutaneous) disorder nding or comorbidity
belongs to it. As a pet-keeper, he or she should pay more attention on this kind
of disease so as to prevent it.
Middle Level code associates Number
Skin (cutaneous) disorder nding, Skin (cutaneous) disorder nding 15
Enteropathy, Skin (cutaneous) disorder nding 11
Ophthalmological disorder nding, Ophthalmological disorder nding 10
After that, we would like to introduce some network properties. All the prop-
erties are can be calculated directly from Gephi or SNAP library. SNAP library
which is short for Stanford Network Analysis Project
12
has a large number of
interfaces for analysis of network. It is quite ecient to manipulate graphs, cal-
culates structural properties, generates graphs, and supports attributes on nodes
and edges. First property is node degree distribution:
Figure 9: GGM degree distribution
nodes number degree
60 nodes 1
45 nodes 2
52 nodes 3
65 nodes 4
37 nodes 5
36 nodes 6
25 nodes 7
17 nodes 8
12 nodes 9
10 nodes 10
6 nodes 11
6 nodes 12
5 nodes 13
2 nodes 15
1 nodes 19
12
http://snap.stanford.edu/snappy/index.html
22
We plot the degree distribution above, which looks like a power-law distribu-
tion. It is worth noting that scale-free network is a network whose degree dis-
tribution follows a power law. So we want to validate whether the distribution
follows power-law ,therefore, classify it to scale-free network or not. Mathemati-
cally, the power law distribution:P(x) x
, where P(x) is the degree number,

x is the degree and is the parameter greater than 1. As the power law belongs
to exponential family, in order to simplify the analysis, we get logarithm of de-
gree distribution to see if it is linear function. Also, we introduce two comparison
functions(piecewiselinear and quadratic), the criterion we choose is Bayesian in-
formation criterion(BIC). BIC mainly consider two factors, how well it ts the
data and how many explanatory variables it uses. The good t means less error
and fewer variables or parameters means the model is simpler and robust to avoid
overtting problem. Given any two estimated models, the model with the lower
value of BIC is the one to be preferred. The threes gures below are calculated
and plotted by Dr Novi Quadrianto, and we can see that the linear model is pre-
ferred. However, as the dierence of score are too small between quadratic and
linear(19.55-19.30=0.25), from [5], the dierence is less than 2, which means linear
model doesnt overwhelm quadratic one with strong evidence.
Figure 10: linear function t with BIC
23
Figure 11: piecewiselinear function t with BIC
Figure 12: quadratic function t with BIC
Another property is clustering coecient which quanties how well connected
are the neighbors of a vertex in a graph[6]. In other words, it is described as the
conjoint nodes of one node are still connected. The clustering coecient of a vertex
is the ratio of existing edges connecting a vertexs neighbours to each other to the
24
maximum possible number of such edges. The ith nodes clustering coecient can
be calculated as:
C
i
=
2e
i
k
i
(k
i
1)
where e
i
is the number of the connections between all these neighbours and k
i
is
the number of neighbours of the ith node. In GGM network, the average clustering
coecient of the whole network is

C =
1
n
n
i=1
C
i
= 0.117641210955. Clustering
coecient is also a evidence that a network is considered as small-world network if
the clustering coecient is signicantly higher than expected by random chance.
As the result is not high enough, we cannot believe GGM is a small-world network.
Average path length is another important concept in network topology. It is
a measure of the eciency of information or mass transport on a network, which
shows the number of steps it takes to get from one node of the network to another.
It is calculated by nding the shortest path between all pairs of nodes, add them
and divide by the total number of pairs. In our GGM network, the average path
length is 4.265. It tells us once a dog has a disease, it would progress 4 more
disorders before aecting object disorder on average.
6.3 Expert validation of GGM network
Except validating with RR and correlation networks through Jaccard index,
another way we introduce is expert validation which verify network results by
someone with high authority on the area of dog disease. This approach is more
authentic and convincing as it is judged by professional or expert. The person we
invite to do expert validation is Dr Dan ONeill. He is dogs trust companion animal
epidemiologist mainly research in Veterinary Epidemiology and Economics and
Public Health areas. He ran his own companion animal practice for 12 years and
started PhD in veterinary epidemiology at the RVC. ONeill is now a post-doctoral
researcher and continues to expand VetCompass to examine health-welfare issues
in dogs.
What we want to validate is the precision of comorbidities. By listing disease
associations of all three networks, the expert can judge whether the comorbidity is
reasonable and label them as Expected or Unexpected. Two criterions decide
how well the result is. One is how many comorbidities the network detect, the
25
other is whether comorbidity is correct. As the dataset is small, we selected 10
most common disorders which avoids unreliability and is likely to have the most
chance of having comorbidities according to ONeills advice. The results from
ONeill attach on the Appendices with two parts: general comment and validation
results. From the validation results, we can see that RR has poorest performance
where it detects 9 comorbidities and 5 of them are expected(5/9), followed by
correlation with precision of 34/50, and GGM has the best precision with 33/43.
Comparing with GGM, network has 1 more detected disorder but 7 more mis-
detected disorders as well. By the criterions described before, we believe GGM
network is the best one as it provides nearly same number but more accurate
comorbidities.
Then, it is a time to take a look at mis-detected comorbidities. We can see
from the ONeills validation result that Vomiting doesnt have any comorbidity.
It is a very common disease and such a wide range of triggers for it may reduce
the specic comorbidity with other disorders being identied in these studies.
Some disorders like Diarrhoea nding has the comorbidity of Nasal planum
nding in both GGM and network, which doesnt make sense at all, thus,
the result is unexpected. However, in dog disease data, 6 dogs aect Nasal
planum nding and 4 of them aect Diarrhoea nding which indicates a strong
relationship between two diseases. This kind of error is due to lack of enough data,
if in real world, these two disease are independent, they should not co-occurrent
many times in dataset according to statistics. In other words, the sample disease
distribution(dataset) should follow the population distribution(real world) if the
sample size is large enough. According to results of expert validation, a possible
improvement can be made is to select a set of penalty parameters or thresholds
and validate each one with expert in order to select the best performance one.
Although this method will consume more human resource, it is quite a reliable
and accurate way.
6.4 Analysing illness progression on dierent gender
This time, we want to analyse comorbidities based on dierent gender. Gender is a
important factor in diagnosing disease, for example, breast cancer is severe disease
mostly aecting female. As a woman, if aected diseases that are the comorbidities
26
of breast cancer, she should pay more attention to prevent it in advance. For this
we calculate the Odds Ratio(OR), OR is the ratio of the odds of an event occurring
in one group to the odds of it occurring in another group. In statistics, it is the
measurement on quantify how strongly the presence of disorder i associates with
the presence of disorder j in a given population. In this experiment, the group
refers to female and male. The expression is shown below:
OR
ij
(, ) =
p
ij
()(1 p
ij
())
p
ij
()(1 p
ij
())
where i and j represent the disease i and disease j in female and male . If
odds ratio equals 1, it means the comorbidity is equally likely to occur in both
female and male. An odds ratio greater than 1 tells us that the comorbidity is
more likely to occur in the female than male, vice versa. In our experiment, we will
present the signicant dierence by selecting a threshold of 2. In the OR network,
if the OR score is bigger than 2 of female over male, we draw a green link(193
links). if the OR score is bigger than 2 of male over female, we draw a red link(169
links). From the network, we can see that Vomiting(15 of 16 links are red, see
Figure 14) and Enteritis(all 5 links are red) are more likely to be infected among
male while Intertrigo(all 5 links are green) and Incontinence - faecal(all 3 links
are green) are more likely to happen in female. Moreover, comorbidities [Behaviour
disorder, Obesity], [Obesity, Urinary incontinence] and [Corneal disorder nding,
Anal sac impaction] should be pay more attention in female with highest OR score
of 7.867. [Claw injury (traumatic), Diarrhoea nding] and [Mitral valve disorder,
Periodontal disease] should be warned among male as they have two highest OR
scores of 10.562 and 8.811 respectively. The OR network is shown below( and can
be also seen online
13
):
13
http://smileclinic.alwaysdata.net/long_msc2014/or_analysis/
27
Figure 13: OR network(green: female, red: male)
Figure 14: Vomiting: male disease with 15 of 16 red links
28
7 GGM on Large Canine Disease data
In this section, we will apply the GGM on the a inconsistent but much larger dog
disease dataset. The more data means the we are more likely to avoid the accidental
event and be condent about the result of GGM. The data was provided by Noel
Kennedy from RVC as part of his work on a Veterinary diseases classication
system. As the data are not structured as good as the original one(429 3884),
we will re-structure it in several ways. In the end, we nd that GGM network on
this dataset doesnt work so good.
7.1 Data Structure
In the large canine disease dataset, the main disease structure are called Data Dic-
tionary(DD). DD groups the VeNom codes or dog disorder codes into a hierarchy,
where most specic disease codes are at the leaf level and more general codes are
at the higher levels. It likes a graph where the nodes represent coded ndings in
the ontology, and the edges are directed from more specic codes to more general
codes. This represents an is-a relationship in the ontology. There are two les
that fully describe the DD relationship. The rst one is DD code to disease-name
mapping and the second one contains the mapping from child code to its parent
codes. One child code could have multiple parent codes. As for the dog disease
le, it contains two columns which are animal id and DD code. There are around
200,000 dogs in the dataset and one dog could has multiple DD codes/diseases.
The dogs are coded at multiple levels of understanding which means the DD in one
dog can contain high level disease and leaf disease in the same time. Also, there is
a problem in the data structure that it is inconsistent which means a dog maybe
positive for a specic disease but that diseases parent term is negative. What is
more, there are 460 DD codes matching original 429 diseases because some diseases
have than one DD codes. For example, Owner unsure has two DD codes 114
and 10. Therefore, we will combine the repeated DD codes after obtaining all the
disease information of the dog disease le. Table below shows part of large canine
diseases le:
29
Animal Id DD code
250012 15
250012 34
250012 128
250012 2545
250012 55070
250012 55071
250012 55102
250020 15
. . . . . .
7.2 Data Processing
To compare and validate the performance with previous networks, we want to
map the diseases of large dog-disease le to the original 429 ones. According the
structure of Date Dictionary, the best way to extract the disease information is to
compare each DD code in the le with 429 diseases DD codes along with their
child DD codes. In other words, for each term (see table above), we will search all
the 429 diseases DD codes and their child DD codes. If any one matches, we can
map the term to certain disease, otherwise, we discard it. By this way, however,
only 38 of 429 diseases has been detected through this searching strategy. It is
not acceptable as the goal is to compare the GGM with previous networks based
on all 429 diseases. The reason for this phenomenon is that most diseases of the
le are in the high level of the DD tree, and as the 429 diseases are in the lower
level, they cannot match each other in the hierarchy. Another way to process the
data is that we can search each term in the le and all its child nodes to see which
diseases in 429-diseases detected. By this method, however, we nd that most of
dogs will cover all the 429 diseases as DD codes in the le are in quite high level.
As we know, if the DD code is the root node and the only one in the disease tree
or hierarchy, it will denitely cover all the nodes when searching its child nodes.
After analysing the structure of DD, we nd that the gap between DD code
in les and 429 diseases is one level only. For example, 3007(not in 429 diseases)
is DD code of Diabetes mellitus nding, which is also the parent of Diabetes
mellitus with code 658(in 429 diseases). There are several 3007 terms and no 658
30
term in the dog disease le. So, if we search 3007 instead of 658, we can detect
the disease Diabetes mellitus through Diabetes mellitus nding. As discussed
above, the search strategy now is to search one level higher of all the 429 diseases,
then detecting their child codes to see if the DD code matches. Then, we nd that
all the 429 diseases can be detected. The penalty parameter we choose this time is
0.01 as it keeps nearly the same node number with previous GGM network. Figure
below is the network we draw from the large dog diseases dataset.
Figure 15: large dataset GGM network
From the gure above, we nd that several diseases or nodes are heavily con-
nected, such as Splenomegaly and Enteropathy, they contains 128 and 107
comorbidities respectively. Both of them have multiple DD codes and their DD
codes are the root nodes in hierarchy. Thus, higher level disease will cause the
over-connected problem while lower level disease will cause under-connected prob-
lem. To sum up, in the large dataset, the result is heavily aected by the structure
31
of DD and it is unreasonable to compare diseases in dierent DD levels.
8 Future work
Hidalgo et al.(2009) showed the phenotypic network based on human diseases and
our work mainly build the network for canine. Actually, both human and canine
are kind of animal from biology perspective. Thus, there could be a potential
connection between canine network and human network. Also, there is plenty of
research focusing on this area, for example, Poldrack et al.(2003) studied the mem-
ory systems of brain between animal and human. Zoobiquity[7] is a publication
providing many cases on the similarity of human world and animal world. The
author is inspired by an eye-opening consultation, which revealed that a monkey
experienced the same symptoms of heart failure as her human patients. Inspired
by this, we suppose that dog comorbidity is similar to human comorbidity. To
validate it, the direct way is to compare the same disease on both networks with
its comorbidities. So, we choose to compare the dog comorbidities with human
phenotypic network built by Hidalgo et al.(2009). However, the human diseases
in their work are coded by ICD-9-CM
14
medical coding reference while animal
diseases are coded by VeNom coding system. The diculty is we cannot get the
precise disease mapping of these two systems. Thus, we compare the comorbidities
manually by ourselves. The disease we select is Chronic kidney disease, because
it has been already studied by ONeill et al. in the paper[8] along with its co-
morbidities. The table below is shown the result of Chronic kidney disease from
human phenotypic network.
14
http://www.icd9data.com/
32
Name ICD9 code prevalence score
Renal failure unspecied 586 0.6869 % 0.141
Nephritis and nephropathy not specied as acute or chronic 583 0.3813 % 0.182
Hypertensive heart and chronic kidney disease, malignant. . . 404 0.5050 % 0.185
Acute renal failure 584 1.7552 % 0.207
Malignant hypertensive renal disease without renal 403 1.4743 % 0.310
hyperosmolality and/or hypernatremia 276 27.1 % 0.107
Mechanical complications of unspecied cardiac device . . . 996 4.4082 % 0.107
Sideroblastic anemia 285 14.8 % 0.109
Nephrotic syndrome 581 0.1831 % 0.114
Nephroptosis 593 2.8091 % 0.121
Chronic glomerulonephritis 582 0.5515 % 0.128
Congestive heart failure unspecied 428 18.3 % 0.139
Result of Chronic kidney disease from ONeill et al. (2013):
Anaemia
Cardiac disorder
Decreased appetite
Halitosis
Hypertension
Lethargy
Melaena
pancreatitis
Polyuria/polydipsia
Urinary incontinence
Vomiting
Weight loss
From the table, we nd that Hypertensive heart and chronic kidney disease,
malignant. . . of human disease can be related to Hypertension of dog disease.
Both Congestive heart failure unspecied(human) and Cardiac disorder(dog)
are the diseases related to heart. Most comorbidities of human and dog are the
problems towards kidney. Thus, it can be seen that these comorbidities are not
independent of each other. As a result, if a precise mapping from animal disease
to human disease can be provided, we may be able to connect and analyse the
comorbidity of them.
33
References
[1] Ming Yuan and Yi Lin. Model selection and estimation in the gaussian graph-
ical model. Biometrika, 94(1):1935, 2007.
[2] Daniela M Witten, Jerome H Friedman, and Noah Simon. New insights and
faster computations for the graphical lasso. Journal of Computational and
Graphical Statistics, 20(4):892900, 2011.
[3] Rahul Mazumder, Trevor Hastie, et al. The graphical lasso: New insights and
alternatives. Electronic Journal of Statistics, 6:21252149, 2012.
[4] Csar A Hidalgo, Nicholas Blumm, Albert-Lszl Barabsi, and Nicholas A
Christakis. A dynamic network approach for the study of human phenotypes.
PLoS computational biology, 5(4):e1000353, 2009.
[5] Robert E Kass and Adrian E Raftery. Bayes factors. Journal of the american
statistical association, 90(430):773795, 1995.
[6] Sara Nadiv Soer and Alexei Vzquez. Network clustering coecient without
degree-correlation biases. Physical Review E, 71(5):057101, 2005.
[7] Barbara Natterson Horowitz and Kathryn Bowers. Zoobiquity: What Animals
Can Teach Us about Being Human. Random House, 2012.
[8] DG ONeill, J Elliott, DB Church, PD McGreevy, PC Thomson, and
DC Brodbelt. Chronic kidney disease in dogs in uk veterinary practices:
prevalence, risk factors, and survival. Journal of Veterinary Internal Medicine,
27(4):814821, 2013.
[9] Jean-Franois Rual, Kavitha Venkatesan, Tong Hao, Tomoko Hirozane-
Kishikawa, Amlie Dricot, Ning Li, Gabriel F Berriz, Francis D Gibbons,
Matija Dreze, Nono Ayivi-Guedehoussou, et al. Towards a proteome-scale map
of the human proteinprotein interaction network. Nature, 437(7062):1173
1178, 2005.
[10] Arthur P Dempster. Covariance selection. Biometrics, pages 157175, 1972.
34
[11] D-S Lee, J Park, KA Kay, NA Christakis, ZN Oltvai, and A-L Barabsi. The
implications of human metabolic network topology for disease comorbidity.
Proceedings of the National Academy of Sciences, 105(29):98809885, 2008.
[12] Kwang-Il Goh, Michael E Cusick, David Valle, Barton Childs, Marc Vidal,
and Albert-Lszl Barabsi. The human disease network. Proceedings of the
National Academy of Sciences, 104(21):86858690, 2007.
[13] Sebastian Schneeweiss, Philip S Wang, Jerry Avorn, and Robert J Glynn.
Improved comorbidity adjustment for predicting mortality in medicare pop-
ulations. Health services research, 38(4):11031120, 2003.
[14] Russell A Poldrack and Mark G Packard. Competition among multiple mem-
ory systems: converging evidence from animal and human brain studies. Neu-
ropsychologia, 41(3):245251, 2003.
[15] Nicolai Meinshausen, Peter Lukas Bhlmann, Peter Lukas Bhlmann, and
Peter Lukas Bhlmann. Consistent neighbourhood selection for sparse high-
dimensional graphs with the lasso. Seminar fr Statistik, Eidgenssische Tech-
nische Hochschule (ETH), Zrich, 2004.
Appendices
Expert validation general comment from Dr Dan ONeill:
Many of the more common disorders in dogs are syndromes in the sense
that they represent a spectrum of underlying specic disorders that al
share a common presentation pattern. This has the result of making
them common as apparently distinctive clinical presentations but may
reduce the comorbidity indices with other disorders because of the vary-
ing underlying true pathologies. It should be noted that comorbidity
studies carried out across all disorders recorded in dogs are subject to
the risk of spurious results being identied due to chance. These stud-
ies are best suited to hypothesis generation and should be conrmed
by later specic conrmatory studies. During the validation process,
35
the expert dened the comorbidity associations as being expected or
unexpected based on current veterinary norms. The unexpected results
are potential new areas for investigation that oer the opportunity to
identify previously unknown associations. While the GGM and Phi
results were generally consistent with current veterinary expectation,
the RR results seemed to miss some important associations identied
by the other two methods. It would appear that RR is a less useful
method in this respect. Overall these comorbidity results are highly
consistent with conventional veterinary understanding of disease as-
sociations. Novel but potentially useful ndings include comorbidity
between DJD and hypothyroidism, and between periodontal disease
and heart disorders.
Validation table:
36
Comorbid disorder
Primary disorder GGM
Clinical
expectation of this
comorbidity Phi
Clinical expectation of
this comorbidity RR
Clinical
expectation of
this comorbidity Veterianry context and comment
Vomiting: no one no one no one
Vomiting is a very common clicnla sign in dogs that is
assocatied with a wide variety of underlying disorders.
As scavenging animals, dogs ahve developed ready
mechanism to induce vomition to reduce theri risks
from serious food intoxiaction. In addition,
regurgiation 9which might be clicnlally confuded for
vomitng) is use din dogs as a mechanism of carrying
food back to their lair for their young. Such a wide
range of triggers for vomiting may reduce the specific
comorbidity with other disorders being identified in
these studies.
Conjunctivitis: Anal sac impaction Expected Blepharitis Expected Blepharitis Expected
Conjunctivitis in dogs may be indicatiive of underlying
alergic disease or atopy. Therefore comorbidity with
other disorders with an underlying allergic or atopic
aetiology is to be expected e.g. Otitis externa,
pododermatitis, blepharitis, hyperkeratosis, immune-
mediated polyarthropathy, reverse sneezing.
Cardiomegaly Unexpected
Corneal disorder
finding Expected Eye proptosed Expected
Addtionally, physical disordes affcetdin the eyelids or
the skin around the eyes cana ct as triggers for
conjunctivitis and thus wodl be expected to be
comorbid e.g. Distichiasis, entropion, intertrigo, eye
porptosed,
Distichiasis Expected Eye proptosed Expected Head tilt Unexpected
Disorers that may be consequent to conjunctivitis are
alo expected coorbid disorders e.g cornal disorders.
Entropion Expected Head tilt Unexpected Heat stroke Unexpected
Unexpected comorbidity disorders include cariomegaly,
head tilt, heat stroke, narcolepsy and oral neoplasia.
Intertrigo Expected Heat stroke Unexpected
Hyperkeratosis
finding Expected
Otitis externa Expected
Hyperkeratosis
finding Expected
Mass lesion -
testicular
Pododermatitis Expected Intertrigo Expected Narcolepsy Unexpected
Mass lesion -
testicular Unexpected
Neoplasm - oral
cavity (mouth) Unexpected
Narcolepsy Unexpected
Polyarthropathy -
immune-mediated Expected
Neoplasm - oral
cavity (mouth) Unexpected Reverse sneezing Expected
Polyarthropathy -
immune-mediated Expected
Reverse sneezing Expected
Urolithiasis
finding Unexpected
Traumatic injury: no one no one no one
Traumatic injuries cover such a wide spectrum of often
random accidents that it is almost reasuring that no
substantial comorbidity was identified.
Obesity: Anal sac impaction Expected
Anal sac
impaction Expected no one
Obesity is an expected co-morbiduty with joint
disorders such as DJD and cruciate becasue incraseed
weight may incraese the strain on the joints while in
reverse the reduced exercsie tolerance of arthritic
dogs may oincrase the tendency for obesity.
Cruciate disease Expected Cruciate disease Expected
It is also logical that obesity is asocaited with anal sac
disordes as the incraseed fat deposits around the anal
region may inhibits anal sac emtying.
DJD Expected DJD Expected
Diarrhoea finding: Nasal planum finding Unexpected
Nasal planum
finding Unexpected no one
It is difficult to see a logical ratonale for comorbidtiy
between diiarrhoea and nasal planum disorders.
DJD:
Adverse reaction to
drug Expected
Adverse reaction
to drug Expected no one
Degenerative joint disease covers a spectrum of bone
and joint disorders and is to more likley as dogs age.
Thus comorbidity with age-related disorders is not
unexpected e.g cataracts, CVA, cognitive dysfunction,
periodonatal.
Cataract Expected Cataract Expected
DJD cases are often managed with NSAIDS drugs that
have a high reate o shwong adverse reaxtoin and so the
comorbidity with adverse reastons si not surprising.
Cerebrovascular
accident (CVA) Expected
Cerebrovascular
accident (CVA) Expected
Assocaition of DJD with other joint disorder is to be
expected e.g. Cruciate disease, hip dysplais, hygroma.
Cognitive dysfunction Expected
Cognitive
dysfunction Expected
Obesity may be both a ciase and a ressone to DJd and
thus is a logical assocation.
Cruciate disease Expected Cruciate disease Expected
Comorbidity with lipoma, hepatopthy and
hypothyroidism are more surprising, but agian may be
aprtial mediated by age assocations.
Heart (cardiac)
murmur Unexpected
Heart (cardiac)
murmur Unexpected
Hepatopathy (liver
disorder) Unexpected
Hepatopathy
(liver disorder) Unexpected
the asocation with hypothyrodism is worth future
investiagtion.
Hip dysplasia Expected Hip dysplasia Expected
Hygroma Expected Hygroma Expected
Hypothyroidism Unexpected Hypothyroidism Unexpected
Lipoma Unexpected Lipoma Unexpected
Obesity Expected
Mass lesion -
eyelid Unexpected
Periodontal disease Expected Obesity Expected
Periodontal
disease Expected
Nail clip: Syncope Unexpected Syncope Unexpected no one
An assocaitoin between nail clipping and syncope is
unexteced: there is no logical reasn for this.
Anal sac impaction: Colitis Expected Colitis Expected no one
Anal sac disorders may be assocatietd with atopy or
alergic disorders e.g. Conjunctivitis, dermatitis adn
colitis.
Conjunctivitis Expected Conjunctivitis Expected
It is also logical that obesity is asocaited with anal sac
disordes as the incraseed fat deposits around the anal
region may inhibits anal sac emtying.
Dermatitis Expected Dermatitis Expected
Obesity Expected Obesity Expected
Periodontal disease:
Abscess - oral
(mouth), dental (tooth) Expected
Abscess - oral
(mouth), dental
(tooth) Expected no one
periodonatl disease is expected to lead to oral
absesses so tis comorbidity is expected.
Cataract Expected Cataract Expected
there have been a number of publciations that haev
shown co-morbidity between CKD and periodonatl
disease.
DJD Expected DJD Expected
Assocatoins bewteen periodonatl diease and cataract
and DJd may be medaited by both being are-related
disorders. A simialr nechansim coudl exist for mitral
valve disease assocations.
Heart (cardiac)
murmur Unexpected
Heart (cardiac)
murmur Unexpected
Comorbidity with papliloom disorders s surprisng but
again may be mediated via an age assocation.
Mitral valve disorder Unexpected
Mitral valve
disorder Unexpected
Papilloma Unexpected Papilloma Unexpected
Renal (kidney) failure
- chronic Expected
Renal (kidney)
failure - chronic Expected
Otitis externa: Aural (ear) haematoma Expected
Aural (ear)
haematoma Expected no one
Aural haemtaom is assocated wit head shaking and
thus is a logical acomorbidity with otiti externa.
Conjunctivitis Expected Conjunctivitis Expected
Otitis externa is liely to often have alergic or atopic
underlying pathology. Thus comorbidity with other
allergic or atopic disorders is to be expected e.g.
Conjunctivitis, dermatitis, skin hypersensitivity, atopic
disease interrigoa dn pododermatisis.
Dermatitis Expected Dermatitis Expected
Hypersensitivity
(allergic) skin disorder Expected
Hypersensitivity
(allergic) skin
disorder Expected
Hypersensitivity
(allergic) skin disorder -
atopic dermatitis Expected
Hypersensitivity
(allergic) skin
disorder - atopic
dermatitis Expected
Intertrigo Expected Intertrigo Expected
Pododermatitis Expected Pododermatitis Expected

Master Project (LongYu)

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Master Project (LongYu)

Hochgeladen von

Copyright:

Verfügbare Formate

University of Sussex

(7), we can get that w

, where P(x) is the degree number,

Das könnte Ihnen auch gefallen