Sie sind auf Seite 1von 28

Expert Systems with Applications 39 (2012) 11782–11791 

Improving medical decision trees by combining relevant health-care criteria 

Joan Albert López-Vallverdú 

, David Riaño, John A. Bohada 

Research Group on Artificial Intelligence (BANZAI), Departament d’Enginyeria Informàtica i Matemàtiques, Universitat Rovira 
i Virgili, Av. Països Catalans 26, 43007 Tarragona, Spain 
a r t i c l e i n f o 
Keywords: Medical decision making Decision trees Background knowledge 
a b s t r a c t 
Through  the  years,  decision  trees  have  been  widely  used  both  to  represent  and  to  conduct  decision  pro-  cesses.  They  can  be 
automatically  induced  from  databases  using  supervised  learning  algorithms which usually aim at minimizing the size of the tree. 
When  inducing  decision  trees  in  a  medical  setting,  the  induction  process  should  consider  the  background  knowledge  used  by 
health-care  professionals  to  make  decisions  in  order  to  produce  decision  trees  that  are  medically  and  clinically  comprehensible 
and  correct.  Comprehensibility  measures  the  medical  coherence  of  the  sequence  of  questions  represented  in  the  tree,  and 
correctness  rates  how  much  irrelevant  are  the  errors  of  the  decision  tree  from  a  medical  or  clinical  point  of  view.  Some 
algorithms  partially  solve  these  problems  pursuing  alternative  objectives  as  reducing  the  economic  cost  or  improving  the 
adherence  of  the  decision  process  to  medical  standards.  However,  from  a  clinical  point  of  view,  none  of  these  criteria  is  valid 
when  it  is  considered  alone,  because  real  med-  ical  decisions  are  taken  attending  to  a  combination  of  them,  and  also  other 
health-care criteria, simulta- neously. Moreover, this combination of criteria is not static and may vary if the decision tree is made 
for  different  purposes  as  screening,  diagnosing,  prognosing  or  drug  and  therapy  prescription.  In  this  paper,  a  decision  tree 
induction  algorithm  that  uses  combinations  of  health-care  criteria  is  presented  and  used  to  generate  decision trees for screening 
and  diagnosing  in  four  medical  domains.  The  mechanisms  to  for-  malize  and  to  combine  these  criteria  are  also  presented.  The 
results  have  been  analyzed  from  both  a  sta-  tistical  and  a  medical  point  of  view,  and  they  suggest  that  our  algorithm  obtains 
decision  trees  that  physicians  evaluated  as  more  comprehensible  and  correct  than  the  decision  trees  obtained  by  previous 
approaches as they keep an equivalent accuracy. 
© 2012 Elsevier Ltd. All rights reserved. 
1. Introduction 
In  medicine,  decision  processes may be of several kinds and for different purposes (Fauci et al., 2009): screening, diagnosing, 
prog-  nosing,  drug  and  therapy  prescription,  etc.  Through  the  years, mul- tiple computer-based structures have been proposed to 
formalize  these  decision  processes.  They  range from statistical approaches as Bayesian Networks (Arsene, Dumitrache, & Mihu, 
2011;  Lucas,  van  der  Gaag,  &  Abu-Hanna,  2004;  Velikova,  de  Carvalho  Ferreira,  &  Lucas,  2007)  or  probabilistic  models 
(Husmeier  et  al.,  2004)  to  symbolic  approaches  as  decision  trees  (Chapman & Sonnenberg, 2003; Podgorelec, Kokol, Stiglic, & 
Rozman,  2002),  decision  tables  (Shiffman,  1997)  or  decision  rules  (Clark & Niblett, 1989; Yeh, Cheng, & Chen, 2011). Among 
them,  decision  trees  have  been  partic-  ularly  successful  and  widely  used  both  to  represent  and  to  conduct  decision  processes. 
Medical  decision  trees  can  be  provided  by  experts  (Candell  Riera,  2003;  Fauci  et  al.,  2009)  or  automatically  in-  duced  from 
medical databases (Ling, Yang, Wang, & Zhang, 2004; López-Vallverdú, Riaño, & Collado, 2007; Quinlan, 1986). In 
computer  science,  three  of  the  most  referred  algorithms  to  induce  decision  trees  are  ID3  (Quinlan, 1986), C4.5 (Quinlan, 1993) 
and  C5.0  (Quinlan,  2003).  They  aim  at  minimizing  the  size  of  the  tree  and  therefore  shortening  the  decision  process  by 
maintaining  the  quality  of  the  final  decision.  The  main  drawback  of  the  trees  pro-  duced  with  these  algorithms  is  that  the final 
trees  only  consider  the  information  that  can  be  extracted  from  the  medical  databases  and  so  they  do  not  necessarily  satisfy 
medical  and  clinical  compre-  hensibility  and  correctness.  Comprehensibility  is  a  measure  of  the  medical  coherence  of  the 
sequence  of  questions  of  the  decision  processes  represented  in  the tree according to the health-care ex- perts (e.g., asking for the 
age  of  the  patient  before  obtaining  the  thyroid-stimulating  hormone  value  can  be accepted in a patient screening process but not 
in  diagnosing  thyroid  malfunctions).  Correctness  rates  how  much  irrelevant  are  the  errors of the decision tree from a medical or 
clinical  point  of  view  (e.g.,  the  medical  error  of  sending  a  patient  to  the  Intensive  Care  Unit  rather  than  to  a  gen-  eral  hospital 
floor is lower than sending him home by mistake). Providing efficient, but also comprehensible and correct decision 

Corresponding author. Tel.: +34 977558516; fax: +34 977559710. E-mail addresses: (J.A. 
López-Vallverdú), david.riano@ (D. Riaño), (J.A. Bohada). 
mechanisms is prior in medical decision making. 
In the past, some approaches to the induction of medical deci- sion trees have pursued alternative objectives as reducing the 
0957-4174/$ - see front matter © 2012 Elsevier Ltd. All rights reserved. 
Contents lists available at SciVerse ScienceDirect 

Expert Systems with Applications 

journal homepage: 
economic  cost  (Chai,  Deng,  Yang,  &  Ling,  2004;  Ling  et  al.,  2004;  Ling,  Sheng,  &  Yang,  2006)  or  improving  the  adherence 
(Horning,  Hoehns,  &  Doucette,  2007)  of  the  decision  process  to  medical  stan-  dards  (López-Vallverdú  et  al.,  2007).  However, 
these  approaches  do  not  guarantee  medical  comprehensibility  and  correctness.  On  the one hand, none of the previous criteria on 
the  length,  the  economic  cost  and  the  adherence  to  clinical  standards  is  useful when it is considered alone, because real medical 
decisions  are  taken attend- ing not only to these criteria but also to many others that are com- bined, simultaneously. On the other 
hand,  the  induction  of  decision  trees  with  those  criteria  cannot  differentiate  among  the  different  possible  application  purposes. 
This  differentiation  is  important  because,  for  example,  a  comprehensible  and  correct  decision  tree  for  diagnosis  can  be 
completely wrong for screening purposes. 
In  this context, in Section 2 we formalize the concept of medical decision process. In Section 3 we propose the mechanisms to 
for-  malize  medical  criteria  in  order  to  include  them  in  a  decision  tree  induction  algorithm,  and  in  Section  4  we  propose  a 
methodology  to  combine  them.  In  Section  5  we  present  a  general  algorithm  to  in-  duce  decision  trees,  identifying  the  points 
where  medical  decision  criteria  can  be  introduced  as  background  knowledge.  These  are  called  choice  points.  In  Section  6,  the 
measures  of  accuracy,  com-  prehensibility  and  correctness  for  the  evaluation  of  the  induced  decision  trees  are  formalized.  The 
inductive  algorithm  is  used  in  Section  7  to  generate  decision  trees  for  the  purposes  of screening and diagnosing in four medical 
domains. The results have been analyzed from a statistical and a medical point of view, and the conclusions reported in Section 8. 
2. Formalizing a decision process 
In medicine there are many different descriptions of what a decision process is (Fauci et al., 2009), therefore it is mandatory to 
define  the  concept  of  medical  decision  process  in  this  paper.  Here,  a  decision  process  is  a  sequence  of  medical  questions  or 
observations that lead to a concrete medical decision. In a particular domain, if Q = {q 



} is the set of valid questions, D = {d 

, d 


} the set of possible decisions and q 

(p) the answer to the question q 

2 Q for a certain patient p, then the finite sequence ðq for i 1 
ðpÞ; patient q 

ðpÞ; p ... in ; q 
which i 

ðpÞ; d 
a p 
Þ health-care represents a medical decision process professional takes decision d 
p 2 D after exact order. having asked Observe that the questions questions represent q 

; q 

; ... patient ; q 

2 Q signs in this and symptoms but also consultation to the patient record or to an expert. Individual decision processes can be 
generalized and structured as decision mechanisms that do not only capture the medical knowl- edge supporting each individual 
decision process, but also provide the way of conducting new decisions under other circumstances or for other patients. Among 
the existing decision mechanisms (see, for example Arsene et al., 2011; Clark & Niblett, 1989; Chapman & Sonnenberg, 2003; 
Husmeier et al., 2004; Podgorelec et al., 2002; Shiffman, 1997) here we choose decision trees because they are structured, 
explicit, and easy to understand and to interpret, which are compelling requirements of a medical process. A decision tree (DT) is 
a decision mechanism that describes decision processes that always start with the same question and concatenate ques- tions in 
such a way that each possible answer to a question is followed by a new question or by a final decision. 
Decision  mechanisms,  as  for  example  DTs,  can  be  automatically  obtained  applying  induction  algorithms.  These  algorithms 
start from a set of data represented in the form (q 




)  where  p  are  the  different  patients,  q’s are questions whose 
answers can be known or not for each patient p, and d 

is  the  decision  taken  for  patient  p.  Observe  that  the  order  of the questions is the 
same for all the patients since it defines the description of the case rather 
J.A. López-Vallverdú et al. / Expert Systems with Applications 39 (2012) 11782–11791 11783 
than a medical decision process in which the questions to the differ- ent patients can be asked in a different order. 
Fig. 1 shows a DT induced using an information gain based algo- rithm to identify patients with heart disease (questions are 
repre- sented as ellipses and decisions as boxes). It does not consider any medical background knowledge so medical 
comprehensibility and correctness is not guaranteed. For example, the question about number of major vessels in the root requires 
an invasive test for all the patients. The systematic application of this test as the first step of the decision process lacks of medical 
sense, and it is com- pletely wrong for certain medical decision processes as screening. The algorithms to induce comprehensible 
and correct decision mechanisms from a set of data must then be based on one or more medical decision criteria that extend the 
statistical sense of asking one or another question with a clinical sense. These medical decision criteria and their formalization are 
introduced in the next section. 
3. Decision criteria in health-care and their formalization 
In  medicine,  the  list  of  criteria  which may be combined to make decisions is very large and diverse. A systematic approach to 
the  organization  of  such  criteria  and  their  representation  using  cost  functions  and  layered  partial  orders  (LPOs)  is  proposed  in 
López-  Vallverdú  and  Riaño  (2012).  In  this  section,  we explain how these criteria can be formalized in order to decide about the 
appropriate  questions  and  decisions  in  a  decision  process.  This  appropriateness  is  used  to  determine  the  best order of questions 
and decisions in medical decision processes. 
3.1. Criteria on the questions 
The  order  in  which  questions  are  asked  in  a  decision  process  is  decided  according  to  the  criteria  on  the  questions.  They are 
used  to  determine  whether  a  question  is  more  or  less  adequate  than  an-  other  one  in  a  given  context.  For  example,  in  diabetes 
screening,  we  may  use  the  decision  time  criterion  to  decide  to  perform  an  oral  glucose  tolerance  test  rather  than  obtaining  the 
longer  2-h  serum  insulin value. When formalizing the criteria on the questions, cost functions or LPOs are defined over the set of 
questions Q. For exam- ple, the expert may choose to represent the economic cost criterion as a cost function f 

:  Q  ? [0,1] and the script criterion, which mea- sures the adherence of the procedure to the sequence specified 
by medical standards (López-Vallverdú & Riaño, 2012), as a LPO 6 
s over Q. 
Criteria  on  the  questions  can  be  contextual  or context-free. A criterion is said to be contextual when it depends on the context 
(related  disease,  medical  purpose,  etc.)  of  the  medical  decision  pro-  cess.  In  a  certain  context,  the  answer  to  a question may be 
impor-  tant  in  order  to  make  a  decision,  but  in another context, this question may be totally unnecessary. For example, the script 
value  of  answering  the  question  stability_of_blood_pressure  is  greater  if  we are deciding where a post-operative patient must be 
sent,  than  if  we  are  determining  whether  the  patient  is  hypothyroid  or  not.  Script  and  granularity  (López-Vallverdú  &  Riaño, 
2012) are exam- ples of contextual criteria. 
Context-free  criteria  do  not  change  when  they  are  used  in  differ-  ent  contexts  because  they  depend  on  the  health-care  test 
needed  to  obtain  the  answer  of  the  question.  For  example,  economic  cost  is  a  context-free  criterion.  The  question 
sodium_on_blood  has  no  economic  cost  itself  but  its  economic  cost  is  related  to  the  blood  test that provides the answer for this 
question.  The  economic  cost  of a regular blood test is always the same regardless of the context. Moreover, a health-care test can 
provide simultaneous answers to several questions of the decision process. For example, a regular 
blood  test  informs  about  the  levels  of  sodium,  urea,  creatinine,  etc.  providing  an  answer  to  the  question  sodium_on_blood,  but 
also  to  urea_on_blood,  creatinine_on_blood,  etc.  Decision  time,  economic  cost,  health  risk  and  physical  comfortability 
(López-Vallverdú  &  Riañ-  o,  2012)  are  examples  of  context-free  criteria.  Notice  that  once  a  health-care  test  is  performed  to 
answer  one of the questions, it does not have to be performed again to answer the rest of the ques- tions related to that health-care 
test.  Being  t  a  health-care  test  that  provides  the  answer  to  a  set  of  questions  Q0  &  Q;  when  a  question  q0  2  Q0  is asked in the 
decision  process,  the  values  for  context-free  criteria  of  the  questions  in  Q0  change,  so  that  ∀q0  2  Q0,  f(q0)  =  0  (for  cost 
functions), and each question q0 2 Q0 is moved to the first layer of 6(for LPOs). 
3.2. Criteria on the decisions 
A  decision  process  concludes with a final decision that can be right or wrong. The relevance of the error in wrong decisions is 
evaluated  with  the  criteria  on  the  decisions.  For  example,  according  to  the  health  risk  criterion,  it  is  safer  to  wrongly  send  a 
post-oper-  ative  patient  to  the  Intensive  Care  Unit  (ICU)  than  sending  him  home  by  mistake.  Some  previous  works  have 
evaluated  the  possi-  ble  wrong  decisions  performed  in  a  decision  process  (Ling  et  al.,  2004,  2006;  Turney,  2000).  In  these 
approaches, an expert has to provide a cost function error(d 


) which returns the error of per- forming d 

when the correct decision is d 

, for each pair of deci- sions d 

, d 

in  the  set  of  possible decisions D. This approach has the inconvenience that the expert is required to provide a value 
for  each  one  of  the  #D  Á  (#D  À  1)  possible  errors  in  the  decision  process  (where  #D  is  the  cardinality  of  D). For medium and 
large  sets  of  decisions  this  is  much  information  that  experts must pro- vide. In order to reduce this effort, here we use a different 
approach that divides the error into type I and type II medical errors which are concepts that medical doctors are familiar with. 
  Type  I  error  represents the relevance of taking a wrong decision (e.g., the economic cost if we send a patient to ICU when this is 
a wrong decision). 
11784 J.A. López-Vallverdú et al. / Expert Systems with Applications 39 (2012) 11782–11791 
Fig. 1. Decision tree to identify patients with heart disease. 
  Type  II  error  represents  the  relevance  of  not  taking  a  correct  decision  (e.g., the risk on the health of a patient who is not sent to 
ICU when this is the correct decision). 
When  formalizing  the  criteria  on  the  decisions,  the  cost  func-  tions  or  the LPOs are defined over the set of decisions D. This 
means  that two cost functions f: D ? [0,1] (or two LPOs 6 over D) are needed for each criteria considered; one for type I error and 
another  one  for  type  II  error.  For  example,  the  expert may choose to represent the type I error of the health risk criterion (h) as a 
LPO 6 

and the type II error as a cost function f 

. This approach requires the expert to only provide 2 Á #D values. 
For each decision d 2 D, this value is f 

(d) or ‘ 

(d)  when  the  criteria  c  is  represented  with  a  cost  function  or a LPO, respectively. 
Com-  pared  with  previous  approaches  (Ling et al., 2004, 2006; Turney, 2000) our proposal requires much less information and it 
is  easier  to  provide  by  experts.  This appreciation was confirmed by the health-care professionals that evaluated the results of this 
work. According to them, our approach is much more closer to the way they objectively measure medical errors. 
4. Combination of criteria 
In  a  decision  process,  questions  and  final  decisions  are  not  cho-  sen  based  on  a  unique  criterion  but  on  the  simultaneous 
applica-  tion  of  a  set  of  medical  criteria.  This  combination  can  be  very complex and it may involve criteria with different levels 
of  priority  and  relevance.  In  this section we present a means to include a com- bination of the formalized criteria in the induction 
of  DTs.  We  first  explain  how  the  inductive  algorithm selects criteria according to their priority, and then we present a method to 
combine them, considering their relevance. 
4.1. Selection of criteria considering their priority 
In  a  decision  process,  medical  and  clinical  criteria  are  arranged  in  different  levels  of  priority.  The  priority  of  a  criterion  is 
defined as the relative position of this criterion in the set of criteria when it is 
used  in  medical  decision  making.  This  priority  is  represented  by  a  positive  number,  1  being  the  highest  priority.  Health-care 
profes-  sionals  may  use  priorities  to  rank  the  relevance  of  the  criteria  in  the  decision  problem  that  they  are  trying  to solve. For 
example,  in  the  selection  of  questions  for  screening  patients  with  diabetes,  the  expert  may  consider  script,  economic  cost  and 
physical com- fortability criteria of higher priority than health risk or decision time. The expert can also avoid the use of priorities 
just by stating that all the criteria have the same level of priority. 
The  criteria  in  the  first  level  of  priority  are  those  which  are  used  to  guide  the  sequence  of  questions  or  to  make  the  final 
decisions  in  the  decision  process.  Only  in  the  case  that  these  criteria  are  not  able  to  identify the best question or decision in the 
process,  the  cri-  teria  of  the  second  level  of  priority  are  considered.  If  these also fail, then the criteria in the third level are used, 
and so on. If none of the levels is useful to choose the best question or decision, then any remaining question q 

2 Q or decision d 

2 D is appropriate and the one with the lower index i is selected. 
4.2. Combination of criteria considering their relevance 
After having considered priorities, the criteria of the same prior- ity are combined according to their relevances. The relevance 
of a criterion is defined as its weight within the combination of criteria used in medical a 2 [0,1] such with priority i. that 
Given decision P 


making > i a 
c a 
c0 1⁄4 we 1, and it is represented by a value 
where C 

contains those criteria say that criterion c is more rele- vant than criterion c0. Health-care professionals must provide the 
relevance of the decision criteria as a means of weighting the rel- ative importance of each criterion in the decision problem that 
they are trying to solve. When combining n criteria represented as cost functions or LPOs we deal with three cases: 
Case 1: Combination of n cost functions (f 
linear combination: g 1⁄4 a 


with 1 
; ... ; a f 

the ): We apply a relevance of criterion c 

. Case 2: Combination of n LPOs (6 

;... ; 6 

): We apply the pro- cedure of combination of LPOs described in López-Vallverdú 
and Riaño (2011a). Case 3: Combination of m cost functions and n À m LPOs 

1 ; ... tion of ; LPOs f 

; 6 
c in mþ1 
(López-Vallverdú ; ... ; 6 

): We apply & the procedure Riaño, 2011a) of combina- to the n À m LPOs obtaining a single LPO 6 0. Then we transform 60 
into a cost function f0 (López-Vallverdú & Riaño, 2011a) and finally, combine the m + 1 cost functions the relevance of f0 
calculated as a0 f 
c 1⁄4 1 
; P 
... n 
; f 

;f a 0 i . 
as in case 1 with 
5. Induction of decision trees based on medical criteria 
The  three  most  successful  and  widely  applied  algorithms  to  in- duce DTs are ID3 (Quinlan, 1986), C4.5 (Quinlan, 1993) and 
C5.0  (Quinlan,  2003)  with  more  than  1800  publications  in  medical  informatics  since  2000.1  These  are  greedy  algorithms  that 
produce  DTs  as  a  result  of  a  top-down  partitioning  process that starts with a dataset which contains descriptions of past decision 
processes. In medical informatics (Podgorelec et al., 2002), these cases represent decisions on patients that are expressed as (q 




)  where  p  are  the  different  patients  considered,  q’s  are 
questions on particular conditions of the patients whose answer can be known or not, and d 

is  the  decision  taken  for  patient  p.  In  spite  of  significant  differences,  the  baseline  of  ID3,  C4.5  and  C5.0  is 
equivalent:  partition  the  dataset  into  subsets  using  the  best  possible  question,  until  the  decision  of  the  remaining  cases  can  be 
considered equivalent, then 
J.A. López-Vallverdú et al. / Expert Systems with Applications 39 (2012) 11782–11791 11785 
take  the  most  appropriate  decision.  This  behavior  is  described  in  Algorithm  1  where  three  choice  points  have  been  identified. 
These  are  points  in  which  background  knowledge  can  be  considered  in  order  to  improve  the  medical  and  clinical 
comprehensibility and correctness of the DT induced. 
Choice  point  one,  in  line  2,  sets  a  condition  for  placing  a  decision  node  (or  not).  For  the  current  dataset  this  condition 
determines whether the situation (q 

(p) )ÁÁÁ) is better represented with a decision (q 

) or if more questions have to be asked (q 

(p) ) d 
p (p) ) q 

(p) )ÁÁÁ). Choice point two, in line 3, is the condition to select the best decision d 

2 D for the current decision process. Choice point three, in line 7, is the condition to select the best question q 

2 Q for the current decision process. 
5.1. Introducing background knowledge in the induction of DTs 
In  order  to  improve  the  medical  comprehensibility  and  correct-  ness  of  the trees induced by ID3, C4.5 or C5.0 and also to be 
able  to  produce  trees  with  a  concrete  medical  orientation  (e.g.,  screening,  diagnosis,  treatment,  etc.),  the  medical  background 
knowledge  is  included  in  Algorithm  1  (see  Fig.  2).  This  knowledge  comes  repre-  sented  by  cost  functions  and  LPOs related to 
each  one  of  the  crite-  ria  taking  part in the decision process. For each criteria, three cost functions (or LPOs) are defined: one for 
questions  and  other  two  for  type  I and type II errors on the decisions. These cost functions and LPOs, together with the priorities 
and relevances of the criteria, de- fine the background knowledge required to produce decision trees with a medical sense. 
A representation of all the background knowledge required is shown in Table 1 where c 

are the criteria selected. For each criterion c 

k (i.e., table row), the background knowledge provides the priority and relevance (p 
and a 
) when the criterion is used to select the questions, for type-I error, and p IIi 
and and the a IIi 
priority and relevance (p 
and a 
Ii for type-II error) when it is used to select the proper decision. Each criterion c 

may  be  represented  as a cost function or a LPO, for questions, and for type I and 
type II errors. Table 1 is a central component of the process described in Fig. 2. 
The  criteria  in  Table  1  are  combined  using  the  methodology  de-  scribed  in  Section 4 obtaining three global cost functions or 
LPOs for each level of priority j: one for criteria on the questions (g 
qj or 6 
1 Bibliographic search in ScienceDirect with keywords medicine AND (id3 OR c4.5 OR c5.0). 
), another one for criteria on the decisions related to type I 
) errors (g 
and a third one for criteria on the decisions related to type II errors (g 
or 6 
). With the aim of inducing DTs that are medically and clinically comprehensible and correct and, at the 
same time, adapted to the health-care purpose the DT must serve to, we propose an implementation for each one of the choice 
points of Algorithm 1 that uses the different global cost functions and LPOs. 
5.2. Condition for placing a decision node 
In  medicine,  deciding  whether  a  decision  process has reached a final decision or if new questions are recommended is a trade 
off between type I and type II errors. Here, these errors are respec- tively represented with the cost functions g 
obtained  for  each  level of priority j (see Fig. 2). If we have global LPOs, 
they are transformed into the cost functions g 
and g 
and g 
(López-Vallverdú & Riaño, 2011a). Therefore, for each priority level j, g 
pro-  vide  the  global  cost  of  accepting  a  wrong  decision and the 
global cost of rejecting a correct decision over a decision process ðq 

and g 
P0, if ðpÞ; P 

ðdÞ i 2 
ðpÞ; is ...; the q i proportion k 
ðpÞ; d 

Þ on a patient p. Given of patients in P0 a set of patients on which the final decision was d, then, considering criteria with priority 
i, the cost of placing a decision node Dec 

(d, P0) is calculated using Eq. (1). The condition for placing a decision node is reached if one of 
the total costs for making a decision d over the current dataset, considering criteria with priority 1, is lower than a threshold 

(d,P0)) < 
(i.e., min 


À d; P 0 
1⁄4 ð 1 À P 

ðdÞ Þ Á g 
dð Þþ 

À P 

À d 0 
Á g 
À d 0 
Á Á 


We  compare  the  costs  for  making  a  decision  with  a  threshold  rather  than  with  the  costs  of  making  a  question  because 
questions and decisions depend on different criteria and thus they are not 
11786 J.A. López-Vallverdú et al. / Expert Systems with Applications 39 (2012) 11782–11791 
Fig. 2. Introducing background knowledge in Algorithm 1. 
Table 1 Representation of the input background knowledge. 
Criteria Questions Decisions 
Type I error Type II error 
p a Formalization p a Formalization p a Formalization 



or 6 



or 6 



or 6 
I1 c 



or 6 



or 6 



or 6 



or 6 



or 6 



or 6 
or 6 
comparable. If a decision is correct enough for the current dataset (its cost is lower than 
) it can be placed in the DT with no need to calculate the cost of making a question. 
This  procedure  considers both the information in the database (proportion of patients for each decision) and the medical back- 
ground knowledge (type I and II error cost for each wrong decision). 
5.3. Select the best decision: correctness 
From  a  medical  point  of  view,  the  most  correct  decision  to  be  made  over  a  certain  set  of  patients,  must  be  determined 
consider- ing type I and type II errors (see g 
and g 
in Fig. 2). Therefore the selection of the best decision is done using Eq. (1). The best 
deci- sion to be selected is the one which minimizes Dec 

. If several deci- sions minimize Dec 

then we select the one of them which minimizes Dec 

. The procedure is repeated for each level of priority until there is only one optimal decision. If the lowest priority 
level is reached and there is not a single optimal decision selected, then the remaining decision d 

with the lowest index i is taken. 
5.4. Select the best question: comprehensibility 
A  decision  process  is  medically  comprehensible  if  the  questions are made in an order similar to the criteria of the health-care 
ex- perts. Therefore, criteria on the questions are involved in the selec- tion of the best question for a certain patient (see g 
and 6 
in  Fig.  2).  Nevertheless,  from  a  medical  point  of  view  the 
most  com-  prehensible  question  is not necessarily the question that leads to the best situation to make a final decision. In order to 
select  com-  prehensible  questions  which  are  also useful to make a final deci- sion, we use the concept of expected cost (EC). For 
each question q 

, the EC represents the cost of making a decision in the next step of the decision process after asking the question q 

. This is the aver- age of the costs of placing decision nodes for each of the 
obtained when a certain set of patients P0 & P is partitioned using q 

. EC is calculated with Eq. (2), where P0 

1⁄4 fp 2 P0 : q 

ðpÞ 1⁄4 ag and A 

(p) = a, p 2 P0}. 

= {a:q 

; P0Þ 1⁄4 
#A 1 


min d2D 
À Dec 

À d;P0 a 
Á Á 
We  compute  EC  for  each  question  and  we  select  those  questions  whose  EC  is  lower  than a threshold d. The best question is 
the one which minimizes the global cost function g 
(or which is in the lowest layer of the LPO 6 
) for criteria on the questions of level of priority 1. If several questions minimize g 
(or are in the lowest layer of 6 
) then we select the one of them which minimizes g 
q2 (or which is in the lowest layer of 6 
).  The  procedure  is  repeated  for  each  level  of  priority  until  there  is  only  one  optimal 
question. If none of the levels is useful to select one of these questions, then the remaining question q 

with the lowest index i is selected. The use of the expected cost together with the criteria on the 
questions guarantees a trade off between the information in the database and the medical background knowledge when selecting 
the best question. 
6. Evaluation of medical decision trees 
The  accuracy  of  a DT is defined as the percentage of correct decisions over the total number of decisions made. Accuracy is a 
statistical  measure  like  sensitivity,  specifity  and  positive  and  neg-  ative  predictive  values  (Lang  &  Secic,  2006),  which 
numerically compares the decisions represented in the DT with the cases in the training dataset. 
These  measures  are  not  based  on  any  kind  of  medical  back-  ground  knowledge,  so  they  are  not  a  valid  way  to  assess  the 
medical comprehensibility and correctness of the DTs. Let pathðp; to DTÞ1⁄4fq 
patient p if we p 1 
; follow q 

; ... ; the q p 

g decision be the sequence tree DT. of questions asked Comprehensibility is calculated with Eq. (3) and evaluates the sequence of 
questions in path(p,DT) for all the patients p 2 P following the indications of the decision tree DT. Comprehensibility takes into 
account the global cost function g 
of  the  criteria  on  questions  with  priority  1.  If  the medical background knowledge is represented with a 
glo- bal LPO 6 
, this has to be transformed into a cost function g 
q1 (López-Vallverdú & Riaño, 2011a), before Eq. (3) is 
comprehensibilityðP;DTÞ 1⁄4 
1 #P 
Á #P À 


Let DN be the set of decision nodes in a decision tree DT (i.e., the terminal nodes of the DT), and let d 


ðqÞ #pathðp;DTÞ 
and P 

be  the  decision  made  and  the  set  of  patients  in  a  decision  node  n  2  DN, 
respectively. Cor- rectness is calculated with Eq. (4) and it evaluates all the final deci- sions made in a DT with the function Dec 

which returns the cost of placing a decision node considering criteria with priority 1. 
correctnessðP;DTÞ 1⁄4 
#DN 1 
Á #DN À 





7. Tests and results 
In  this  section,  we  detail  the  tests  carried  out  on  the  induction  of  medically  comprehensible  and  correct  DTs  and the results 
ob-  tained  with  our  algorithm  on  four  medical  domains  from  the  UCI  Repository  of  Machine  Learning  (Frank  &  Asuncion, 
2010).  The  do-  mains  are  diabetes  with  768  patients,  8  questions  and  2  decisions;  heart  disease  with  303 patients, 13 questions 
and 2 decisions; 
J.A. López-Vallverdú et al. / Expert Systems with Applications 39 (2012) 11782–11791 11787 
post-operative with 90 patients, 8 questions and 3 decisions, and thyroid with 3772 patients, 20 questions and 3 decisions. 
The  background  knowledge  about  the  different  decision  criteria  in  all  four  domains  has  been  provided  by  physicians  of  the 
Clinical  Hospital  in  Barcelona  (CHB)  (Spain)  and  the  SAGESSA  Health  Care  Group  (Spain).  For  each  domain,  these 
professionals  selected  some  medical  criteria and provided the background knowledge accord- ing to Table 1 and for the purposes 
of patient screening and patient diagnosis. 
7.1. The tests 
With  the  aim  of  finding  evidence  that  our  approach  (MEDBK)  provides comprehensible and correct DTs which are useful to 
rep-  resent  medical  decision  processes  and,  at  the  same  time,  showing  the  limitations  of  the  information  gain  based  algorithms 
(IG)  as  ID3,  C4.5  or  C5.0  in the induction of medical DTs,2 we have per- formed the following two types of test on the previous 
four medical domains. 
Test type 1 to show evidence that MEDBK generates comprehen- sible and correct medical DTs, with no loss of accuracy with 
respect to IG. Test type 2 to show evidence about the suitability of MEDBK to produce decision mechanisms for different 
purposes (screening and diagnosis) for the same datasets. 
The  first  type  of  test  has  been  performed  by  generating  DTs  to  screen  patients  in  the  four  domains.  MEDBK  required  the 
profes-  sionals  of  the  two  health-care  institutions  to  agree  on  the  criteria  to  be  used  and  also  on  the priorities and relevances of 
such  criteria  for  a  screening  decision  process.  Table  2  summarizes  the  selected  criteria  extracted  from  the  list  in 
(López-Vallverdú  &  Riaño,  2012)  (column  1),  their  respective  priorities  (columns  p),  rele-  vances  (columns  a)  and  their 
formalization  as  cost  functions  or  LPOs,  for  questions,  and  type  I  and  type  II  errors  on  the  decisions.  The  cost  functions  and 
LPOs  are  not  provided  here  because  each  medical  domain  tested  has its own ones. These are 25 cost func- tions and 15 LPOs in 
total which are provided in López-Vallverdú and Riaño (2011b). 
According  to  physicians,  some  of  the  criteria  in  Table  2  are  not  appropriate  for  selecting  questions  or  considering  type  I  or 
type II errors. These appear as ‘–’ in the table meaning that they are not part of the background knowledge. 
All  these  tests  have  been  performed  with  and  without  cross-val-  idation,  and  with  and  without  pruning.  Cross-validation  is 
used  to  analyze  the  robustness  of  the  DTs  and,  in  our  case,  it  consisted  in  repeating  the  following  procedure  10  times.  We 
randomly  sepa-  rated  90%  of  the  patients  of  the  initial dataset and we used them to generate the DT which was then tested using 
the  remaining  10%  of  the  patients.  Pruning  is  used  to  reduce  the  overfitting  of DTs and to remove sections of a DT that may be 
based  on noisy or erroneous data. Pruning is based on a prefixed percentage of DT node representativity. So, during the induction 
process,  if  a  node  of the DT represents less than this percentage of patients, it becomes a decision node. For representativity ratio 
we used 2%. We compared the results of these tests with the DTs obtained with IG. 
The  second  type  of  test  was  centered  in  the  thyroid  domain  and  consisted  in  the  generation  of  DTs with both the IG and the 
MEDBK algorithms for the decision processes of patient screening and pa- tient diagnosis. 
The results of the two types of test were analyzed by physicians 
2 In the following tests we used as IG the Weka J48 implementation of the C4.5 algorithm (Witten & Frank, 2005). 
of  the  two  previously  mentioned  health-care  institutions  and  their  main  conclusions  summarized  in  Section  7.2.  We  also 
compared  the  accuracy,  comprehensibility  and correctness of the DTs in- duced by MEDBK in comparison with those other DTs 
generated with IG. This comparison is detailed in Section 7.3. 
7.2. Decision trees obtained and medical analysis 
With  MEDBK,  we  have  induced  DTs to screen patients in the medical domains of diabetes, heart disease, post-operative, and 
thyroid.  Several  physicians  proposed  the  criteria, priorities and rel- evances in order to avoid as much as possible the presence of 
ques-  tions  based  on  risky,  uncomfortable  or  expensive  medical  tests (see Table 2). In Fig. 3 we provide one of the DTs induced 
with  MEDBK.  Contrarily  to  the  DT  obtained  with  IG  (see  Fig.  1), this one is based on low-invasive questions as age, sex, chest 
pain  type,  resting  blood  pressure,  resting  electrocardiogram  and  maximum  heart  rate  rather  than  in  other  questions  based  on 
invasive tests as for exam- ple the number of major vessels. Observe that the DT induced with 
Table 2 Priorities, relevances and formalization of the medical criteria to perform screening decision processes. 
Criteria Criteria on the questions Criteria on the decisions 
Type I error Type II error 
p a Formalization p a Formalization p a Formalization 
Script 1 1 6 
– – – – – – Health risk 2 1 6 
a 1 1 f 

Physical comf. 3 0.4 6 
1 0.9 f 
a – – – Economic cost 3 0.4 f 
1 0.1 f 
Ic 2 0.5 f 
– – – Decision time 3 
0.2 f 
2 0.5 f 
– – – 
a For the post-operative domain, it was formalized with a LPO. 
11788 J.A. López-Vallverdú et al. / Expert Systems with Applications 39 (2012) 11782–11791 
Fig. 3. DT induced for the screening of heart disease using MEDBK. 
MEDBK  uses  the  questions  age  and  sex  (highest  priority  according  to  the  criterion  script  López-Vallverdú  &  Riaño,  2011b) 
before  ask-  ing  other  questions.  However,  the  trade  off  of  our  method  between  the  information  in  the database and the medical 
background  knowledge  causes  that  not  always  the  latter  is  the  one  that  deter-  mines  the  sequence  of questions. For example, in 
one  branch  the  question  maximum  heart  rate  is  used  to  make  a  final  decision,  with-  out  having  asked  other  questions  with  a 
higher  priority  like  resting  blood  pressure,  fasting  blood  sugar  and  serum cholestorol. The physi- cians qualified the behavior of 
this  DT  as according to normal practice, whereas the one depicted in Fig. 1 was rejected as inap- propriate for decision making in 
the screening of patients with heart disease. 
This interpretation is the same for all the DTs obtained in the four medical domains tested and it is corroborated by the numer- 
ical results discussed in Section 7.3. All the DTs obtained with IG represent medical decision processes that are either more risky, 
uncomfortable or expensive than the ones obtained with MEDBK. 
Fig. 4. LPO over the questions to diagnose thyroid malfunctioning. 
MEDBK  was  also  used  to  induce  different  DTs  for  the  same  input  data.  This  was  possible  by  adjusting  the  set  of  selected 
criteria  and  their  priorities  and  relevances  to  the  sort  of  medical  decision  de-  sired (i.e., screening or diagnosis). Centered in the 
thyroid  problem,  MEDBK  was  used  to  generate  DTs  to  screen  and  to  diagnose  pa-  tients.  The  criteria  were  again  the  ones  in 
Table  2  for  the  screening  process,  and  script  for  the  diagnosis process. The script criterion was represented with the LPO in Fig. 
4.3  MEDBK  proposed  a  DT  to  screen  patients  with  thyroid  problems,  and  another  DT  to  diagnose  thyroid malfunctioning (see 
Fig.  5).  Both  DTs  were  accepted  as  cor-  rect  by  the  team  of physicians supporting this work. The DT that was obtained with IG 
was  not  accepted  for  screening  purposes,  but  acceptable  for  diagnosis.  However,  in spite that the DT proposed by IG was pretty 
similar  to  the  one in Fig. 3 (and therefore appropri- ate for diagnostic4), the physicians concluded that even in a diagno- sis, there 
is  always  a  set  of  medical  criteria  guiding  the  selection  of  questions.  And,  since these criteria cannot be incorporated to IG, this 
algorithm  is  also  unable  to  guarantee  DTs  representing  good  diagno-  sis processes. This fact has been observed in several of the 
domains  studied,  as  diabetes  whose  DTs  incorporated  questions  related  to  blood  pressure  or  pregnancy  which  are  irrelevant  in 
order to make final diagnostic decisions. 
7.3. The quality of the results 
The  quality  of  medical  DTs  is  measured  in  terms  of  their  accu-  racy  and  their  medical  comprehensibility  and  correctness. 
Table  3  shows  these  values  for  the  MEDBK DTs when they are used to screen patients in the domains of diabetes, heart disease, 
post- operative, and thyroid. The average of the IG DTs is also provided for the sake of comparison. 
The  quality  of  a  medical  DT  is  also  related to the capability of this tree to remain unchanged and still represent good medical 
decisions  (i.e.,  DT  robustness)  and  the  ability  not  to  represent  chance  decisions  (i.e.,  DT overfitting). In Table 3 we provide the 
re-  sults  before  and  after  applying  cross-validation  in  order  to  analyze  the  robustness  of  the  DT  obtained,  and  also  the  results 
before and after applying pruning in order to analyze overfitting. 
7.3.1. Accuracy of DTs 
We  observe  that  the  mean  difference  between  the  average  accu-  racies  of  the DTs without cross-validation obtained with IG 
and  MEDBK  is  3.9%  (4.3%  with  pruning  and  3.5% without pruning). This difference can be explained by the fact that MEDBK 
is not designed to maximize accuracy but to maximize comprehensibility and cor- 
3 The 16 other questions that do not appear in the LPO are in layer 4 but they were omitted for space reasons (see 
López-Vallverdú & Riaño, 2011b). 
Fig. 5. DT induced by MEDBK for the diagnosis of thyroid. 
4  The  physicians  argued  that  some  cases  of  thyroid  problems  could  not  be  diagnosed  with  the  IG  and  MEDBK  DTs  because 
there were not instances of such cases in the input database. 
Table 3 Results obtained for DTs to screen patients in four medical domains with MEDBK. 
With pruning Without pruning 
Acc. (%) 
Cor. (%) 
With cross-validation Diabetes 71.4 78.0 77.9 74.0 78.5 79.6 Heart disease 77.7 92.7 87.7 74.2 90.8 85.5 Post- 
Com. (%) 
64.4 90.9 90.0 57.8 82.5 84.2 
Thyroid 95.4 85.4 95.5 95.9 88.4 95.9 
Average 77.2 86.8 87.8 75.5 85.1 86.3 Average IG 75.5 76.2 85.4 75.3 44.9 85.4 
Without cross-validation Diabetes 78.5 81.8 84.0 83.1 81.0 86.9 Heart disease 82.5 92.0 90.5 91.7 88.6 95.4 Post- 
75.6 83.6 94.7 92.2 83.2 98.3 
Thyroid 95.5 83.9 95.5 97.5 81.0 97.5 
Average 83.0 85.3 91.2 91.1 83.5 94.5 Average IG 87.3 39.0 91.6 94.6 42.4 95.8 
J.A. López-Vallverdú et al. / Expert Systems with Applications 39 (2012) 11782–11791 11789 
rectness.  On  the  contrary,  IG  is  an  algorithm  oriented  to  accuracy  maximization,  but  it  obtains  DTs  whose  accuracies  are  not 
signifi-  cantly  better  than  the  ones  obtained  with  MEDBK. At the same time cross-validation shows that the accuracy of IG DTs 
diminishes  more  quickly  than  the  accuracies obtained with MEDBK DTs (15.5% and 10.7%, respectively). Therefore IG obtains 
slightly more accurate but less robust DTs. 
7.3.2. Comprehensibility of DTs 
The  results  of  comprehensibility  are  clearly  favorable to MEDBK, whose average comprehensibility is 43.7% better. Thyroid 
is  a  clear  example  in  which  comprehensibility  is  more  than 60% better with respect to IG trees, for all the tests performed. In all 
four  domains,  the  results  show  that  the  order  of  the  questions  in  the  DTs  pro-  duced  with  MEDBK  is  more  coherent  from  a 
medical point of view. 
7.3.3. Correctness of DTs 
The  strong  relation  between  accuracy  (i.e.,  percentage  of  correct  decisions)  and  correctness  (i.e.,  quality  of  the  decisions) 
causes  that,  often,  the  results  obtained  by  IG  in  terms  of  mean  correctness  are  good.  Nevertheless,  when  comparing  IG  and 
MEDBK  DTs  we find cases where IG DTs are better in accuracy but worse in correctness. This means that MEDBK makes more 
mistakes than IG (1.4% in average) but these mistakes are less important. This happens in 
several  cases  as,  for  example,  in  the  DTs  for  screening  of  post-oper-  ative  patients  with  pruning.  According  to  accuracy,  IG 
obtains  a better DT than MEDBK (with respective accuracies 82.2% and 75.6%), but medical correctness indicates that the errors 
of  the  DTs  induced  with MEDBK are less critical from a medical or clinical point of view (this is represented with the respective 
correctness values 89.7% and 94.7%). 
7.3.4. Robustness of DTs 
The  results  in  Table  3  suggest  that  MEDBK  DTs  are better at making decisions over new patients. With cross-validation, the 
average  loss  of  accuracy  is  4.9%  lower  with  MEDBK  than  with  IG,  with respect to the DTs generated without cross-validation. 
The  differences  on  the loss of comprehensibility and correctness are less relevant but also favorable to MEDBK (1.6% and 2.5%, 
respectively). This means that the DTs generated with MEDBK are more robust than the trees generated with IG. 
7.3.5. Overfitting of DTs 
Pruning is a satisfactory procedure because it obtains smaller DTs which reduce overfitting while there is not a significant loss 
of  accuracy,  correctness  and  comprehensibility.  Both  MEDBK  and  IG  obtain  DTs  with  a  similar  average  loss  of  accuracy  and 
correctness  when  applying  pruning  (always  below  3.5%).  As  far  as  comprehen-  sibility  is  concerned,  DTs  of  MEDBK  are 
medically better after prun- ing (1.8% in average), while those of IG are significantly worse (6.1% in average). 
8. Conclusions 
The  information  gain  based  algorithms  to  induce  decision  trees  in  complex  domains  cannot  always  guarantee  acceptable 
results  from  an  expert  point of view. Concretely, in the medical domain, these algorithms do not consider health-care criteria and 
therefore, important aspects as the risks of the clinical procedures or the pa- tient uncomfortability can be left out of their decision 
processes.  Moreover,  medical  errors  in  the  final  decisions  can  be  critical and therefore their recommendation cannot be taken as 
medically  cor-  rect. For the same dataset, these algorithms always produce the same DT regardless of its final medical purpose or 
intentionality.  This  is  not  correct  because,  for example, a good DT for diagnosing is not necessarily a good DT for other medical 
decision processes like screening or disease treatment. 
Here,  we  have  proposed  an  algorithm  to  induce  medical  DTs  that  uses  a  combination  of  some  relevant  health-care  criteria. 
The chosen criteria and their respective priorities and relevances allow the algorithm to produce DTs oriented to different medical 
The  tests  performed  in  the  medical  domains  of diabetes, heart disease, post-operative and thyroid malfunctioning for the pur- 
poses  of  screening  and  diagnosing  conclude  that  the  medical  DTs  generated  with  the  new  algorithm  are  medically 
comprehensible  and  correct,  while  their  accuracy  is  not  significantly  worse  than  the  one  obtained  with  information  gain  based 
algorithms,  but  more  robust  to  new  data.  The sequences of questions of the trees in these domains are medically comprehensible 
and  do  not  imply  unnecessary  risky,  uncomfortable  or  expensive  medical  tests.  With  respect  to  correctness,  the  presence  of 
critically  wrong  decisions  is  avoided.  Cross-validation  and  pruning  tests  indicate  that  the  DTs  obtained  by  our  algorithm  are 
robust and resistant to overfitting. 
In  the  future,  this  work will be continued following three lines. The first line is the exploitation of health-care databases about 
dif-  ferent  medical  decision  processes  like  prevention,  screening,  diag-  nosing  and  patient  treatment,  in  order  to  automatically 
adjust the relevances that produce the most accurate, comprehensible and 
11790 J.A. López-Vallverdú et al. / Expert Systems with Applications 39 (2012) 11782–11791 
correct  DTs  with  respect  to  the  medical  decisions  contained  in  the data. Our aim is to consider all the criteria and let the optimi- 
zation  algorithm  to  determine  the  relevances  which  will  approach  to  zero  for  those  criteria  that  are  not  used  in  each  concrete 
decision  process.  At  the  end,  we  expect  to  have  a  family  of  criterion-rele-  vance  pairs  describing  each  medical  process and we 
will use them to compare the way of working of different medical centres. 
The  second  line  will  adapt  the  current  induction  of  DTs  to  the  induction  of  clinical  algorithms  (Bohada,  Riaño,  & 
López-Vallverdú,  2012;  Riaño,  López-Vallverdú,  &  Tu,  2008).  A  clinical  algorithm  (CA)  is  a  flow  diagram  consisting  of 
branching-logic  pathways  which  represent  sequences  of  clinical  decisions,  for  teaching  clini-  cal  decision  making,  and  for 
guiding  patient  care.  These  branching-  logic  pathways  can  be  represented  with  DTs,  therefore  they  can  be  induced  with  the 
algorithm  in  Section  5.  Considering  this,  we  will  aim  to  induce  medically  comprehensible  and  correct  CAs  from  hospital 
databases by including medical background knowledge. 
The third line will face the induction of medical DTs following a different approach. We can accept that medical criteria are 
found implicit in the data available about medical decisions. Starting with databases containing decision q accurate, i 

ðpÞ; ... ; q 
comprehensible i 

ðpÞ; d 
p Þ, we will study the possibilities and correct DTs processes as ðq i 

ðpÞ; of generating without considering an explicit representation of medical criteria (Torres, López- Vallverdú, & Riaño, 2011a). 
We  would  like  to  thank  Dr.  Collado  and  Dr.  Alonso  for  their  con-  tinuous  support  leading  the  groups  of  health-care 
professionals from the SAGESSA Health Care Group (Spain) and the Clinical Hos- pital in Barcelona (Spain), respectively. 
Arsene, O., Dumitrache, I., & Mihu, I. (2011). Medicine expert system dynamic Bayesian network and ontology based. Expert 
Systems with Applications, 38, 15253–15261. Bohada, J. A., Riaño, D., & López-Vallverdú, J. A. (2012). Automatic generation 
of clinical algorithms within the state-decision-action model. Expert Systems with 
Applications<>. Candell Riera, J. (2003). Estratificación pronóstica tras infarto 
agudo de miocardio. 
Revista Espanola de Cardiologia, 56(3), 303–313. Chai, X., Deng, L., Yang, Q., & Ling, C. X. (2004). Test-cost sensitive 
Nayïve Bayesian 
classification. In Proceedings 4th IEEE international conference on data mining. Chapman, G. B., & Sonnenberg, F. A. 
(Eds.). (2003). Decision making in health care: Theory, psychology and applications. Cambridge series on judgement and 
decision making. Cambridge University Press. Clark, P., & Niblett, T. (1989). The CN2 induction algorithm. Machine Learning, 
261–283. Fauci, A. S., Braunwald, E., Kasper, D. L., Hauser, S. L., Longo, D. L., & Jameson, J. L., et al. (Eds.). (2009). 
Featuring the complete contents of Harrison’s principles of internal medicine (17th ed. McGraw Hill. Harrison’s Online. 
Horning, K. K., Hoehns, J. D., & Doucette, W. R. (2007). Adherence to clinical practice guidelines for 7 chronic conditions in 
long-term-care patients who received pharmacist disease management services versus traditional drug regimen review. Journal of 
Managed Care Pharmacy, 13(1), 28–36. Husmeier, D., Dybowski, R., & Roberts, S. (Eds.). (2004). Probabilistic modelling in 
bioinformatics and medical informatics. Springer. Lang, T. A., & Secic, M. (2006). How to report statistics in medicine (2nd 
ed.). American 
College of Physicians. Ling, C. X., Yang, Q., Wang, J., & Zhang, S. (2004). Decision trees with minimal costs. 
In Proceedings 21st international conference on machine learning. Ling, C. X., Sheng, V. S., & Yang, Q. (2006). Test 
strategies for cost-sensitive decision 
trees. IEEE Transaction on Knowledge and Data Engineering, 18(8), 1055–1067. López-Vallverdú, J. A., & Riaño, D. 
(2011a). Cost functions and partial orders as medical background knowledge: formalization and operations. Research report 
DEIM-RR- 11-003. Spain: Universitat Rovira i Virgili. < reports/DEIM-RR-11-003.pdf> Accessed 
March 2012. López-Vallverdú, J. A., & Riaño, D. (2011b). Repository of background knowledge. 
<> Accessed March 2012. López-Vallverdú, J. A., & Riaño, D. (2012a). 
Decision criteria in health-care and their representation. Research report DEIM-RR-12-001. Spain: Universitat Rovira i 
Virgili. <> Accessed March 2012. López-Vallverdú, J. A., Riaño, D., & 
Collado, A. (2007). Increasing acceptability of decision trees with domain attributes partial orders. In Proceedings of the 20th 
IEEE international symposium on computer-based medical systems, Maribor, Slovenia. Lucas, P., van der Gaag, L., & 
Abu-Hanna, A. (2004). Bayesian networks in 
biomedicine and health-care. Artificial Intelligence in Medicine, 30(3), 201–214. Frank, A., & Asuncion, A. (2010). UCI 
Machine Learning Repository. Irvine, CA: University of California, School of Information and Computer Science. <http://>. Podgorelec, V., Kokol, P., Stiglic, B., & Rozman, I. (2002). Decision trees: An overview 
and their use in medicine. Journal of Medical Systems, 26(5), 445–463. Quinlan, J. R. (1986). Induction of decision trees. 
Machine Learning, 1(1), 81–106. Quinlan, J. R. (1993). C4.5: Programs for machine learning. San Mateo, CA., USA: 
Morgan Kaufman. Quinlan, J. R. (2003). C5.0 Online tutorial. <> Accessed 
March 2012. 
J.A. López-Vallverdú et al. / Expert Systems with Applications 39 (2012) 11782–11791 11791 
Riaño, D., López-Vallverdú, J. A., & Tu, S. (2008). Mining hospital data to learn SDA* 
clinical algorithms. LNAI (Vol. 4924, pp. 46–61). Shiffman, R. N. (1997). Representation of clinical practice guidelines in 
conventional and augmented decision tables. Journal of the American Medical Informatics Association, 4, 382–393. Torres, P., 
López-Vallverdú, J. A., & Riaño, D. (2011). Inducing decision trees from 
medical decision processes. LNAI (Vol. 6512, pp. 40–55). Turney, P. D. (2000). Types of cost in inductive concept learning. 
In Workshop on cost-sensitive learning at the 7th international conference on machine learning. California: Stanford University. 
Velikova, M., de Carvalho Ferreira, N., & Lucas, P. (2007). Bayesian network decomposition for modeling breast cancer 
detection. In Artificial intelligence in medicine, AIME 2007, Amsterdam, The Netherlands. LNAI (Vol. 4594, pp. 346–350). 
Springer. Witten, I. H., & Frank, E. (2005). Data mining: Practical machine learning tools and 
techniques (2nd ed.). Morgan Kaufman. Yeh, D., Cheng, C., & Chen, Y. (2011). A predictive model for cerebrovascular 
using data mining. Expert Systems with Applications, 38(7), 8970–8977.