al Predi
tion
Stephen Muggleton,
Mi hael Bain,
Abstra t
Indu
tive Logi
Programming (ILP) involves
onstru
ting an hypothesis H on the basis of ba
kground
knowledge B and training examples E . An independent test set is used to evaluate the a
ura
y of H .
This paper
on
erns an alternative approa
h
alled Analogi
al Predi
tion (AP). AP takes B; E and
then for ea
h test example hx; y i forms an hypothesis Hx from B; E; x. Evaluation of AP is based on
estimating the probability that Hx (x) = y for a randomly
hosen hx; y i. AP has been implemented within
CProgol4.4. Experiments in the paper show that on English past tense data AP has signi
antly higher
predi
tive a
ura
y on this data than both previously reported results and CProgol in indu
tive mode.
However, on KRK illegal AP does not outperform CProgol in indu
tive mode. We
onje
ture that AP
has advantages for domains in whi
h a large proportion of the examples must be treated as ex
eptions
with respe
t to the hypothesis vo
abulary. The relationship of AP to analogy and instan
ebased learning
is dis
ussed. Limitations of the given implementation of AP are dis
ussed and improvements suggested.
1 Introdu
tion
1.1
Suppose that you are trying to make taxonomi
predi
tions about animals. You might already have seen
various animals and know some of their properties. Now you meet a platypus. You
ould try and predi
t
whether the platypus was a mammal, sh, reptile or bird by forming analogies between the platypus and
other animals for whi
h you already know the
lassi
ations. Thus you
ould reason that a platypus is
like other mammals sin
e it su
kles its young. In doing so you are making an assumption whi
h
ould be
represented as the following
lause.
lass(A,mammal) : has_milk(A).
It might be di
ult to nd a
onsistent assumption similar to the above whi
h allowed a platypus to be
predi
ted as being a sh or a reptile. However, you
ould reason that a platypus is similar to various birds
you have en
ountered sin
e it is both warm blooded and lays eggs. Again this would be represented as
follows.
lass(A,bird) : homeothermi
(A), has_eggs(A).
Note that the hypotheses above are related to a parti
ular test instan
e, the platypus, for whi
h the
lass
value (mammal, bird, et
.) is to be predi
ted. We will
all this form of reasoning Analogi
al Predi
tion
(AP).
1.2
In the above AP is given a test instan
e x, a training set E and ba
kground knowledge B . It then
onstru
ts
an hypotheses Hx whi
h not only
overs some of the training set but also predi
ts the
lass y of x. This
an
Current address: Department of Arti
ial Intelligen
e, S
hool of Computer S
ien
e and Engineering, University of New
South Wales, Sydney 2052, Australia.
AP
v
8
7
6
5
4
3
2
1
    +
 +
  +
   +
  +
  
+
+
+
+
+
+

ILP
v
      ? +       
8
7
6
5
4
3
2
1
1 2 3 4 5 6 7 u
Predi
tion: `+'
    +
 +
  +
   +
  +
  
+
+
+
+
+
+

IBL
v
      ? +       
1 2 3 4 5 6 7 u
Predi
tion: `'
8
7
6
5
4
3
2
1
    +
 +
  +
   +
  +
  
+
+
+
+
+
+

      ? +       
1 2 3 4 5 6 7 u
Predi
tion: `'
Figure 1: Comparison of AP, ILP and IBL. Instan
es x are pairs hu; v i from f1; ::; 7g f1; ::; 8g. Ea
h x
an
have a
lassi
ation y 2 f+; g. The test instan
e x to be
lassied is denoted by `?'. The rounded box in
ea
h
ase denes the extension of the hypothesis H . Below ea
h box the
orresponding predi
tion for x is
given.
be
ontrasted with the normal semanti
s of ILP [10, in whi
h hypotheses H are
onstru
ted on the basis of
B and E alone. In this
ase x is presented as part of the test pro
edure after H has been
onstru
ted.
AP is in some ways more similar to Instan
eBased Learning (IBL) (see for example [3), in whi
h the
lass y would be attributed to x on the basis of its proximity to various elements of E . However, in the
ase
of IBL, instead of
onstru
ting H , a similarity measure is used to determine proximity.
Figure 1 illustrates the dieren
es in predi
tion between AP, standard ILP and IBL on a 2D binary
lassi
ation problem. The test
on
ept, `insign', is a
tually a pi
ture of the symbol ``' made out of +'s.
AP predi
ts x to be positive based on the following hypothesis.
insign(U,V) : 3=<U=<6, 4=<V=<5.
Assuming the use of the
losed world assumption for predi
tion, normal ILP will predi
t x to be negative
based on the following hypothesis (note the ex
eption at h5; 4i).
insign(U,V) : 3=<U=<4, 2=<V=<7.
Finally IBL will predi
t x to be negative based on the fa
t that 5=6 of the surrounding instan
es are negative.
Note that IBL's impli
it hypothesis in this
ase
ould be denoted by the following denial.
: insign(U,V), near(U,V,6,5).
The ba
kground predi
ate near/4 in the above en
odes the notion of `nearness' used in a knearest neighbour
type algorithm.
1.3
Motivation
AP
an be viewed as a halfway house between IBL and ILP. IBL has a number of advantages over ILP.
These in
lude ease of updating the knowledgebase and the fa
t that theory revision is unne
essary after
the addition of new examples. AP shares both these advantages with IBL. On the other hand IBL has a
number of disadvantages with respe
t to ILP. Notably, IBL predi
tions la
k explanation, and there is a need
to dene a metri
to des
ribe similarity between instan
es. Generally, similarity metri
s are hard to justify,
even when they
an be shown to have desirable properties (eg. [5, 11, 12). In
omparison AP predi
tions
are dire
tly asso
iated with an expli
it hypothesis, whi
h provides explanation. Also AP does not require a
similarity measure sin
e predi
tions are made on the basis of the hypothesis.
This paper has the following stru
ture. A formal framework for AP is provided in Se
tion 2. Se
tion 3
des
ribes an implementation of AP within CProgol4.4 (ftp://ftp.
s.york.a
.uk/pub/ML GROUP/progol4.4).
2
Aleave(B ,E )
Let AP=Ap=aP=ap=0
For ea
h e = hx; y i in E
(a) Constru
t ?B;x
E := E n e
(b) Using E nd most
ompressive Hx ?B;x
if y = Hx (x) then
if y = T rue then AP := AP + 1
else ap := ap + 1
else
if y = T rue then Ap := Ap + 1
else aP := aP + 1
Print
ontingen
ytable(AP,Ap,aP,ap)
0
Figure 2: Aleave algorithm. Algorithms from CProgol4.1 are used to (a)
onstru
t the bottom
lause and
(b) sear
h the renement graph.
This implementation is restri
ted to the spe
ial
ase of binary
lassi
ation. Experiments using this implementation of AP are des
ribed in Se
tion 4. On the standard English past tense data set [7 AP has higher
predi
tive a
ura
y than FOIDL, FOIL and CProgol in indu
tive mode. By
ontrast, on KRK illegal AP
performs slightly worse than CProgol in indu
tive mode. In the dis
ussion (Se
tion 5) we
onje
ture that
AP has advantages for domains in whi
h a large proportion of the examples must be treated as ex
eptions
with respe
t to the hypothesis vo
abulary. We also
ompare AP to analogi
al reasoning. The results are
summarised in Se
tion 6, and further improvements in the existing implementation are suggested.
2 Denitions
We assume denumerable sets X; Y representing the instan
e and predi
tion spa
es respe
tively and a probability distribution D on X . The target theory is a fun
tion f : X ! Y . An AP learning algorithm L takes
ba
kground knowledge B together with a set of training examples E fhx ; f (x )i : x 2 X g. For any given
B , E and test instan
e x 2 X the output of L is an hypothesised fun
tion Hx . Error is now dened as
follows.
0
error(L; B; E ) = P r
x2D
(1)
3 Implementation
AP has been implemented as a builtin predi
ate aleave in CProgol4.4 (available from
ftp://ftp.
s.york.a
.uk/pub/ML GROUP/progol4.4). The algorithm, shown in Figure 3,
arries out a leaveoneout pro
edure whi
h estimates AP error as dened in Equation (1). In terms of Se
tion 2 ea
h left out
example e is viewed as a pair hx; y i where x is a ground atom and y = True if e is positive and y = False if
e is negative.
Errors are
omputed using the
ounters AP, Ap, aP and ap (`a' and `p' stand for a
tual and predi
ted,
and
apitalisation/non
apitalisation stands for the value being True/False). For ea
h example e = hx; y i
left out, a bottom
lause ?B;x is
onstru
ted whi
h predi
ts y := True. A renement graph sear
h of the
type des
ribed in [8 is
arried out to nd a maximally
ompressive single
lause Hx whi
h subsumes ?B;x .
In doing so
ompression is
omputed relative to E n e. If no
ompressive
lause is found then the predi
tion
is False. Otherwise it is True.
The pro
edure Print
ontingen
ytable(AP,Ap,aP,ap) prints a twobytwo table of the 4 values together
with the a
ura
y estimate, standard error of estimation and 2 probability.
past([w,o,r,r,y,[w,o,r,r,i,e,d).
past([
,l,u,t,
,h,[
,l,u,t,
,h,e,d).
past([w,h,i,z,[w,h,i,z,z,e,d).
past([g,r,i,n,d,[g,r,o,u,n,d).
KRK illegality
illegal(3,5,6,7,6,2).
illegal(3,6,7,6,7,4).
: illegal(2,5,5,2,4,1).
: illegal(5,7,1,2,0,0).
KRK illegality
illegal(A,B,A,B, , ).
illegal(A,B, , ,C,D) : adj(A,C), adj(B,D).
illegal(A, ,B, ,B, ) : not A=B.
illegal( ,A,B,C,B,D) : A<C, A<D.
4 Experiments
The experiments were aimed at determining whether AP
ould provide in
reased predi
tive a
ura
y over
other ILP algorithms. Two standard ILP data sets were
hosen for
omparison (des
ribed in Se
tion 4.2
below).
4.1
Experimental hypotheses
The following null hypotheses were tested in the rst and se
ond experiments respe
tively.
Null hypothesis 1.
AP does not have higher predi tive a ura y than any other ILP system on any stan
Null hypothesis 2.
AP has higher predi tive a ura y than any other ILP system on all standard data
Note that hypothesis 1 is not the negation of hypothesis 2. If both are reje
ted then it means simply that
AP is better for some domains but not others.
4.2
Materials
The following data sets were used for testing the experimental hypotheses.
English past tense.
KRK illegality.
This is des ribed in [15, 6, 7). The available example set Epast has size 1390.
This was originally des ribed in [9. The total instan e spa e size is 86 = 262144.
For both domains the form of examples are shown in Figure 3 and the form of hypothesised
lauses in Figure
4. Note that in the KRK illegality domain negative examples are those pre
eded by a `:', while the English
past tense domain has no negative examples. The absen
e of negative examples in the English past tense
domain is
ompensated for by a
onstraint on hypothesised
lauses whi
h Mooney and Cali [7
all output
ompleteness. In the experiments output
ompleteness is enfor
ed in CProgol4.4 by in
luding the following
user dened
onstraint whi
h requires that past/2 is a fun
tion.
: hypothesis(past(X,Y),Body,_),
lause(past(X,Z),true), Body, not(Y==Z).
4.3
4.3.1
Method
English past tense
Mooney and Cali [7
ompared the predi
tive a
ura
y of FOIL, IFOIL and FOIL on the alphabeti
English
past tense task. We interpret the des
ription of their training regime as follows. Training sets of sizes 25,
4
90
80
70
60
50
AP
CProgol4.4 (induction)
FOIDL
FOIL
IFOIL
Default rules
40
30
50
100
150
200
250
300
Training Set Size
350
400
450
500
KRK illegality
Predi
tive a
ura
ies were
ompared for CProgol4.4 using AP with leaveoneout (aleave) against indu
tion
with leaveoneout (leave). Training sets of sizes 5, 10, 20, 30, 40, 60, 80, 100, 200 were randomly sampled
with repla
ement from the total example spa
e, with 10 sets of ea
h size. For ea
h of the training sets both
aleave and leave were run. The resulting predi
tive a
ura
ies were averaged for ea
h training set size.
4.4
4.4.1
Results
English past tense
The ve learning
urves are shown in Figure 5. The horizontal line labelled \Default rules" represents the
following simple Prolog program whi
h adds a `d' to verbs ending in `e' and otherwise adds `ed'.
past(A,B) : split(A,B,[e), split(B,A,[d), !.
past(A,B) : split(B,A,[e,d).
The dieren
es between AP and all other systems are signi
ant at the 0.0001 level with 250 and 500
examples. Thus null hypothesis 1 is
learly reje
ted.
KRK illegality
95
90
85
80
Induction  leave
AP
Majority class
75
70
65
20
40
60
80
100
120
Training Set Size
140
160
180
200
KRK illegality
The two learning
urves are shown in Figure 6. The horizontal line labelled \Majority
lass" shows the
per
entage of negative examples in the domain. Only the a
ura
y dieren
e between indu
tion and AP
for 200 examples is signi
ant at the 0.05 level, though taken together the dieren
es are signi
ant at the
0.0001 level. Thus null hypothesis 2
an be reje
ted.
?
Figure 7: IQ test problem of the type studied by Evans.
5.1
Relationship of AP to analogy
Evans' [4 early studies of analogy
on
entrated on IQ tests of the form shown in Figure 7. Su
h problems
have the general form \A is to B as C is to ?".
Analogi
al reasoning is often viewed as having a
lose relationship to indu
tion. For instan
e, both Peir
e
and Polya suggested that analogi
al
on
lusions
ould be dedu
ed via an indu
tive hypothesis [13, 14. Similar
views are expressed in the Arti
ial Intelligen
e literature [2. For instan
e, Arima [1 formalises the problem
of analogy as involving a
omparison of a base B and target T . When B and T are found to share a similarity
property S analogi
al reasoning predi
ts that they will also share a proje
ted property P . This is formalised
in the following analogi
al inferen
e rule.
P (B )
S (T ) ^ S (B )
P (T )
The rule above
an be viewed as involving the
onstru
tion of the following indu
tive hypothesis.
From this P (T )
an be inferred dedu
tively. Note that S and P
an obviously be extended to take more than
one argument. For instan
e, given the Evans' type problem in Figure 7 we might formulate the following
hypothesis as a Prolog
lause.
is_to(X,Y) : invertall(X,Y).
In this way we
an view analogi
al reasoning as a spe
ial
ase of AP, in whi
h the example set
ontains a
single base example and the test instan
e relates to the target. A
ording to Arima the following issues are
seen as being
entral in the dis
ussion of analogy.
1. What obje
t (or
ase) should be sele
ted as a base with respe
t to a target,
2. whi
h property is signi
ant in analogy among properties shared by two obje
ts and
3. what property is to be proje
ted w.r.t. a
ertain similarity.
These three issues are handled in the CProgol4.4 AP implementation as follows.
1. A set of base
ases is used from the example set based on maximising
ompression over the hypothesis
spa
e,
2. relevant properties are found by
onstru
ting the bottom
lause relative to the test instan
e and
3. the relevant proje
tion properties are de
ided on the basis of modeh de
larations given to CProgol.
7
A
knowledgements
The rst author would like to thank Thirza and Clare for their goodnatured support during the writing
of this paper. This work was supported partly by the Esprit Long Term Resear
h A
tion ILP II (proje
t
20237), EPSRC grant GR/K57985 on Experiments with Distributionbased Ma
hine Learning.
Referen
es
[1 J. Arima. Logi
al Foundations of Indu
tion and Analogy. PhD thesis, Kyoto University, 1998.
[2 T. Davies and S.J. Russell. A logi
al approa
h to reasoning by analogy. In IJCAI87, pages 264{270.
Morgan Kaufmann, 1987.
[3 W. Emde and D. Wetts
here
k. Relational Instan
eBased Learning. In L. Saitta, editor, Pro
eedings of the 13th International Ma
hine Learning Conferen
e, pages 122{130, Los Altos, 1996. Morgan
Kaufmann.
[4 T.G. Evans. A program for the solution of a
lass of geoemtri
analogy intellgen
e test questions. In
M. Minsky, editor, Semanti
Information Pro
essing. MIT Press, Cambridge, MA, 1968.
[5 A. Hut
hinson. Metri
s on terms and
lauses. In M. Someren and G. Widmer, editors, Pro
eedings of
the Ninth European Conferen
e on Ma
hine Learning, pages 138{145, Berlin, 1997. Springer.
[6 C.X. Ling. Learning the past tense of english verbs: the symboli
pattern asso
iators vs.
onne
tionist
models. Journal of Arti
ial Intelligen
e Resear
h, 1:209{229, 1994.
[7 R.J. Mooney and M.E. Cali. Indu
tion of rstorder de
ision lists: Results on learning the past tense
of english verbs. Journal of Arti
ial Intelligen
e Resear
h, 3:1{24, 1995.
[8 S. Muggleton. Inverse entailment and Progol. New Generation Computing, 13:245{286, 1995.
[9 S. Muggleton, M.E. Bain, J. HayesMi
hie, and D. Mi
hie. An experimental
omparison of human and
ma
hine learning formalisms. In Pro
eedings of the Sixth International Workshop on Ma
hine Learning,
Los Altos, CA, 1989. Kaufmann.
[10 S. Muggleton and L. De Raedt. Indu
tive logi
programming: Theory and methods. Journal of Logi
Programming, 19,20:629{679, 1994.
[11 S.H. NienhuysCheng. Distan
e between Herbrand interpretations: a measure for approximations to a
target
on
ept. In N. Lavra
and S. Dzeroski, editors, Pro
eedings of the Seventh International Workshop
on Indu
tive Logi
Programming (ILP97), pages 321{226, Berlin, 1997. SpringerVerlag. LNAI 1297.
8
[12 S.H. NienhuysCheng. Distan
es and limits on Herbrand interpretations. In C.D. Page, editor, Pro
eedings of the Eighth International Conferen
e on Indu
tive Logi
Programming (ILP98), pages 250{260,
Berlin, 1998. Springer. LNAI 1446.
[13 C.S. Peir
e. Elements of logi
. In C. Hartshorne and P. Weiss, editors, Colle
ted Papers of Charles
Sanders Peir
e, volume 2. Harvard University Press, Cambridge, MA, 1932.
[14 G. Polya. Indu
tion and analogy in mathemati
s. In Mathemati
s and Plausible Reasoning, volume 1.
Prin
eton University Press, Prin
eton, 1954.
[15 D.E. Rumelhart and J.L. M
Clelland. On learning the past tense of english verbs. In Explorations in
the Mi
roStru
ture of Cognition Vol. II, pages 216{271. MIT Press, Cambridge, MA, 1986.