You are on page 1of 9

Analogi

al Predi tion
Stephen Muggleton,

Mi hael Bain,

Department of Computer S ien e,


University of York,
YO10 5DD,
United Kingdom.

Abstra t

Indu tive Logi Programming (ILP) involves onstru ting an hypothesis H on the basis of ba kground
knowledge B and training examples E . An independent test set is used to evaluate the a ura y of H .
This paper on erns an alternative approa h alled Analogi al Predi tion (AP). AP takes B; E and
then for ea h test example hx; y i forms an hypothesis Hx from B; E; x. Evaluation of AP is based on
estimating the probability that Hx (x) = y for a randomly hosen hx; y i. AP has been implemented within
CProgol4.4. Experiments in the paper show that on English past tense data AP has signi antly higher
predi tive a ura y on this data than both previously reported results and CProgol in indu tive mode.
However, on KRK illegal AP does not outperform CProgol in indu tive mode. We onje ture that AP
has advantages for domains in whi h a large proportion of the examples must be treated as ex eptions
with respe t to the hypothesis vo abulary. The relationship of AP to analogy and instan e-based learning
is dis ussed. Limitations of the given implementation of AP are dis ussed and improvements suggested.

1 Introdu tion
1.1

Analogi al predi tion (AP)

Suppose that you are trying to make taxonomi predi tions about animals. You might already have seen
various animals and know some of their properties. Now you meet a platypus. You ould try and predi t
whether the platypus was a mammal, sh, reptile or bird by forming analogies between the platypus and
other animals for whi h you already know the lassi ations. Thus you ould reason that a platypus is
like other mammals sin e it su kles its young. In doing so you are making an assumption whi h ould be
represented as the following lause.
lass(A,mammal) :- has_milk(A).

It might be di ult to nd a onsistent assumption similar to the above whi h allowed a platypus to be
predi ted as being a sh or a reptile. However, you ould reason that a platypus is similar to various birds
you have en ountered sin e it is both warm blooded and lays eggs. Again this would be represented as
follows.
lass(A,bird) :- homeothermi (A), has_eggs(A).

Note that the hypotheses above are related to a parti ular test instan e, the platypus, for whi h the lass
value (mammal, bird, et .) is to be predi ted. We will all this form of reasoning Analogi al Predi tion
(AP).
1.2

AP, indu tion and instan e-based learning

In the above AP is given a test instan e x, a training set E and ba kground knowledge B . It then onstru ts
an hypotheses Hx whi h not only overs some of the training set but also predi ts the lass y of x. This an
 Current address: Department of Arti ial Intelligen e, S hool of Computer S ien e and Engineering, University of New
South Wales, Sydney 2052, Australia.

AP

v
8
7
6
5
4
3
2
1

- - - - +
- +
- - +
- - - +
- - +
- - -

+
+
+
+
+
+
-

ILP

v
- - - - - - ? + - - - - - - -

8
7
6
5
4
3
2
1

1 2 3 4 5 6 7 u
Predi tion: `+'

- - - - +
- +
- - +
- - - +
- - +
- - -

+
+
+
+
+
+
-

IBL

v
- - - - - - ? + - - - - - - -

1 2 3 4 5 6 7 u
Predi tion: `-'

8
7
6
5
4
3
2
1

- - - - +
- +
- - +
- - - +
- - +
- - -

+
+
+
+
+
+
-

- - - - - - ? + - - - - - - -

1 2 3 4 5 6 7 u
Predi tion: `-'

Figure 1: Comparison of AP, ILP and IBL. Instan es x are pairs hu; v i from f1; ::; 7g  f1; ::; 8g. Ea h x an
have a lassi ation y 2 f+; g. The test instan e x to be lassi ed is denoted by `?'. The rounded box in
ea h ase de nes the extension of the hypothesis H . Below ea h box the orresponding predi tion for x is
given.
be ontrasted with the normal semanti s of ILP [10, in whi h hypotheses H are onstru ted on the basis of
B and E alone. In this ase x is presented as part of the test pro edure after H has been onstru ted.
AP is in some ways more similar to Instan e-Based Learning (IBL) (see for example [3), in whi h the
lass y would be attributed to x on the basis of its proximity to various elements of E . However, in the ase
of IBL, instead of onstru ting H , a similarity measure is used to determine proximity.
Figure 1 illustrates the di eren es in predi tion between AP, standard ILP and IBL on a 2D binary
lassi ation problem. The test on ept, `insign', is a tually a pi ture of the symbol ``' made out of +'s.
AP predi ts x to be positive based on the following hypothesis.
insign(U,V) :- 3=<U=<6, 4=<V=<5.

Assuming the use of the losed world assumption for predi tion, normal ILP will predi t x to be negative
based on the following hypothesis (note the ex eption at h5; 4i).
insign(U,V) :- 3=<U=<4, 2=<V=<7.

Finally IBL will predi t x to be negative based on the fa t that 5=6 of the surrounding instan es are negative.
Note that IBL's impli it hypothesis in this ase ould be denoted by the following denial.
:- insign(U,V), near(U,V,6,5).

The ba kground predi ate near/4 in the above en odes the notion of `nearness' used in a k-nearest neighbour
type algorithm.
1.3

Motivation

AP an be viewed as a half-way house between IBL and ILP. IBL has a number of advantages over ILP.
These in lude ease of updating the knowledge-base and the fa t that theory revision is unne essary after
the addition of new examples. AP shares both these advantages with IBL. On the other hand IBL has a
number of disadvantages with respe t to ILP. Notably, IBL predi tions la k explanation, and there is a need
to de ne a metri to des ribe similarity between instan es. Generally, similarity metri s are hard to justify,
even when they an be shown to have desirable properties (eg. [5, 11, 12). In omparison AP predi tions
are dire tly asso iated with an expli it hypothesis, whi h provides explanation. Also AP does not require a
similarity measure sin e predi tions are made on the basis of the hypothesis.
This paper has the following stru ture. A formal framework for AP is provided in Se tion 2. Se tion 3
des ribes an implementation of AP within CProgol4.4 (ftp://ftp. s.york.a .uk/pub/ML GROUP/progol4.4).
2

Aleave(B ,E )
Let AP=Ap=aP=ap=0
For ea h e = hx; y i in E
(a) Constru t ?B;x
E := E n e
(b) Using E nd most ompressive Hx  ?B;x
if y = Hx (x) then
if y = T rue then AP := AP + 1
else ap := ap + 1
else
if y = T rue then Ap := Ap + 1
else aP := aP + 1
Print- ontingen y-table(AP,Ap,aP,ap)
0

Figure 2: Aleave algorithm. Algorithms from CProgol4.1 are used to (a) onstru t the bottom lause and
(b) sear h the re nement graph.
This implementation is restri ted to the spe ial ase of binary lassi ation. Experiments using this implementation of AP are des ribed in Se tion 4. On the standard English past tense data set [7 AP has higher
predi tive a ura y than FOIDL, FOIL and CProgol in indu tive mode. By ontrast, on KRK illegal AP
performs slightly worse than CProgol in indu tive mode. In the dis ussion (Se tion 5) we onje ture that
AP has advantages for domains in whi h a large proportion of the examples must be treated as ex eptions
with respe t to the hypothesis vo abulary. We also ompare AP to analogi al reasoning. The results are
summarised in Se tion 6, and further improvements in the existing implementation are suggested.

2 De nitions
We assume denumerable sets X; Y representing the instan e and predi tion spa es respe tively and a probability distribution D on X . The target theory is a fun tion f : X ! Y . An AP learning algorithm L takes
ba kground knowledge B together with a set of training examples E  fhx ; f (x )i : x 2 X g. For any given
B , E and test instan e x 2 X the output of L is an hypothesised fun tion Hx . Error is now de ned as
follows.
0

error(L; B; E ) = P r

x2D

[hx (x) 6= f (x)

(1)

3 Implementation
AP has been implemented as a built-in predi ate aleave in CProgol4.4 (available from
ftp://ftp. s.york.a .uk/pub/ML GROUP/progol4.4). The algorithm, shown in Figure 3, arries out a leaveone-out pro edure whi h estimates AP error as de ned in Equation (1). In terms of Se tion 2 ea h left out
example e is viewed as a pair hx; y i where x is a ground atom and y = True if e is positive and y = False if
e is negative.
Errors are omputed using the ounters AP, Ap, aP and ap (`a' and `p' stand for a tual and predi ted,
and apitalisation/non- apitalisation stands for the value being True/False). For ea h example e = hx; y i
left out, a bottom lause ?B;x is onstru ted whi h predi ts y := True. A re nement graph sear h of the
type des ribed in [8 is arried out to nd a maximally ompressive single lause Hx whi h subsumes ?B;x .
In doing so ompression is omputed relative to E n e. If no ompressive lause is found then the predi tion
is False. Otherwise it is True.
The pro edure Print- ontingen y-table(AP,Ap,aP,ap) prints a two-by-two table of the 4 values together
with the a ura y estimate, standard error of estimation and 2 probability.

English past tense

past([w,o,r,r,y,[w,o,r,r,i,e,d).
past([ ,l,u,t, ,h,[ ,l,u,t, ,h,e,d).
past([w,h,i,z,[w,h,i,z,z,e,d).
past([g,r,i,n,d,[g,r,o,u,n,d).

KRK illegality

illegal(3,5,6,7,6,2).
illegal(3,6,7,6,7,4).
:- illegal(2,5,5,2,4,1).
:- illegal(5,7,1,2,0,0).

Figure 3: Form of examples for both domains


English past tense

KRK illegality

past(A,B) :- split(A,C,[r,r,y), split(B,C,[r,r,i,e,d).

illegal(A,B,A,B, , ).
illegal(A,B, , ,C,D) :- adj(A,C), adj(B,D).
illegal(A, ,B, ,B, ) :- not A=B.
illegal( ,A,B,C,B,D) :- A<C, A<D.

Figure 4: Form of hypothesised lauses

4 Experiments
The experiments were aimed at determining whether AP ould provide in reased predi tive a ura y over
other ILP algorithms. Two standard ILP data sets were hosen for omparison (des ribed in Se tion 4.2
below).
4.1

Experimental hypotheses

The following null hypotheses were tested in the rst and se ond experiments respe tively.
Null hypothesis 1.

AP does not have higher predi tive a ura y than any other ILP system on any stan-

Null hypothesis 2.

AP has higher predi tive a ura y than any other ILP system on all standard data

dard data set.


sets.

Note that hypothesis 1 is not the negation of hypothesis 2. If both are reje ted then it means simply that
AP is better for some domains but not others.
4.2

Materials

The following data sets were used for testing the experimental hypotheses.
English past tense.
KRK illegality.

This is des ribed in [15, 6, 7). The available example set Epast has size 1390.

This was originally des ribed in [9. The total instan e spa e size is 86 = 262144.

For both domains the form of examples are shown in Figure 3 and the form of hypothesised lauses in Figure
4. Note that in the KRK illegality domain negative examples are those pre eded by a `:-', while the English
past tense domain has no negative examples. The absen e of negative examples in the English past tense
domain is ompensated for by a onstraint on hypothesised lauses whi h Mooney and Cali [7 all output
ompleteness. In the experiments output ompleteness is enfor ed in CProgol4.4 by in luding the following
user de ned onstraint whi h requires that past/2 is a fun tion.
:- hypothesis(past(X,Y),Body,_), lause(past(X,Z),true), Body, not(Y==Z).
4.3
4.3.1

Method
English past tense

Mooney and Cali [7 ompared the predi tive a ura y of FOIL, IFOIL and FOIL on the alphabeti English
past tense task. We interpret the des ription of their training regime as follows. Training sets of sizes 25,
4

English past tense


100

90

Predictive accuracy (%)

80

70

60

50

AP
CProgol4.4 (induction)
FOIDL
FOIL
IFOIL
Default rules

40

30
50

100

150

200

250
300
Training Set Size

350

400

450

500

Figure 5: Learning urves for alphabeti English past tense.


50, 100, 250 and 500 were randomly sampled without repla ement from Epast , with 10 sets of ea h size. For
ea h training set E a test set of size 500 was randomly sampled without repla ement from Epast n E . Ea h
learning system was applied to ea h training set and the predi tive a ura y assessed on the orresponding
test set. Results were averaged for ea h training set size and reported as a learning urve.
For the purposes of omparison we followed the above training regime for CProgol4.4 in indu tive mode.
We also ran AP with the aleave predi ate built-in to CProgol4.4 (Se tion 3) on ea h of the training sets and
then for ea h training set size averaged the results over the 10 sets.
4.3.2

KRK illegality

Predi tive a ura ies were ompared for CProgol4.4 using AP with leave-one-out (aleave) against indu tion
with leave-one-out (leave). Training sets of sizes 5, 10, 20, 30, 40, 60, 80, 100, 200 were randomly sampled
with repla ement from the total example spa e, with 10 sets of ea h size. For ea h of the training sets both
aleave and leave were run. The resulting predi tive a ura ies were averaged for ea h training set size.
4.4
4.4.1

Results
English past tense

The ve learning urves are shown in Figure 5. The horizontal line labelled \Default rules" represents the
following simple Prolog program whi h adds a `d' to verbs ending in `e' and otherwise adds `ed'.
past(A,B) :- split(A,B,[e), split(B,A,[d), !.
past(A,B) :- split(B,A,[e,d).

The di eren es between AP and all other systems are signi ant at the 0.0001 level with 250 and 500
examples. Thus null hypothesis 1 is learly reje ted.

KRK illegality
95

Predictive accuracy (%)

90

85

80

Induction - leave
AP
Majority class

75

70

65
20

40

60

80

100
120
Training Set Size

140

160

180

200

Figure 6: Learning urves for KRK illegality.


4.4.2

KRK illegality

The two learning urves are shown in Figure 6. The horizontal line labelled \Majority lass" shows the
per entage of negative examples in the domain. Only the a ura y di eren e between indu tion and AP
for 200 examples is signi ant at the 0.05 level, though taken together the di eren es are signi ant at the
0.0001 level. Thus null hypothesis 2 an be reje ted.

5 Dis ussion and related work


The strong reje tion of the two null hypotheses indi ate that the advantages of AP relative to indu tion are
domain dependent. The authors believe that AP has advantages for domains, like the English past tense,
in whi h a large proportion of the examples must be treated as ex eptions with respe t to the hypothesis
vo abulary. Note that KRK illegal ontains ex eptions, though they fall into a relatively small number of
lasses, and have relatively low frequen y (a 2 lause approximation of KRK illegal has over 90% a ura y).
By ontrast, around 20% of the verbs in the past tense data are irregular.
It should be noted that our implementation of AP has a tenden y to overgeneralise. This stems from the
assymetry in onstru ting only lauses whi h make positive predi tions in the aleave algorithm (Se tion 3).
The tenden y to overgeneralise de reases a ura y in the KRK illegal domain but in reases a ura y in the
past tense domain, due to the la k of negative examples. Even when negative examples are added to the
past tense training set, predi tive a ura y is una e ted due to the output ompleteness onstraint.
The AP a ura ies on English past tense data shown in Figure 5 are the highest on this data set in the
literature. However, it is interesting to note that CProgol's indu tion mode results are as good as FOIDL.
This ontradi ts Mooney and Cali 's laim that FOIDL's de ision list representation gives FOIDL strong
advantages in this domain.

?
Figure 7: IQ test problem of the type studied by Evans.
5.1

Relationship of AP to analogy

Evans' [4 early studies of analogy on entrated on IQ tests of the form shown in Figure 7. Su h problems
have the general form \A is to B as C is to ?".
Analogi al reasoning is often viewed as having a lose relationship to indu tion. For instan e, both Peir e
and Polya suggested that analogi al on lusions ould be dedu ed via an indu tive hypothesis [13, 14. Similar
views are expressed in the Arti ial Intelligen e literature [2. For instan e, Arima [1 formalises the problem
of analogy as involving a omparison of a base B and target T . When B and T are found to share a similarity
property S analogi al reasoning predi ts that they will also share a proje ted property P . This is formalised
in the following analogi al inferen e rule.

P (B )
S (T ) ^ S (B )
P (T )
The rule above an be viewed as involving the onstru tion of the following indu tive hypothesis.

8x:S (x) ! P (x)

From this P (T ) an be inferred dedu tively. Note that S and P an obviously be extended to take more than
one argument. For instan e, given the Evans' type problem in Figure 7 we might formulate the following
hypothesis as a Prolog lause.
is_to(X,Y) :- invertall(X,Y).

In this way we an view analogi al reasoning as a spe ial ase of AP, in whi h the example set ontains a
single base example and the test instan e relates to the target. A ording to Arima the following issues are
seen as being entral in the dis ussion of analogy.
1. What obje t (or ase) should be sele ted as a base with respe t to a target,
2. whi h property is signi ant in analogy among properties shared by two obje ts and
3. what property is to be proje ted w.r.t. a ertain similarity.
These three issues are handled in the CProgol4.4 AP implementation as follows.
1. A set of base ases is used from the example set based on maximising ompression over the hypothesis
spa e,
2. relevant properties are found by onstru ting the bottom lause relative to the test instan e and
3. the relevant proje tion properties are de ided on the basis of modeh de larations given to CProgol.
7

6 Con lusions and further work


In this paper we have introdu ed the notion of AP as a half-way house between indu tion and instan e-based
learning. An implementation of AP has been in orporated into CProgol4.4
(ftp://ftp. s.york.a .uk/pub/ML GROUP/progol4.4). In experiments AP produ ed the best predi tive a ura y results to date on the English past tense data, outstripping FOIDL by around 10% after 500 examples.
However, on KRK illegal AP performs onsistently worse than CProgol4.4 in indu tive mode. We believe
that AP works best with domains in whi h a large proportion of the examples must be treated as ex eptions
with respe t to the hypothesis vo abulary.
The present implementation of AP is limited in a number of ways. For instan e, for any test instan e x
predi tions must be binary, i.e. y 2 fTrue; Falseg. Also, be ause no onstru ted hypotheses are ever stored,
AP annot deal with re ursion. It is envisaged that a strategy whi h mixed both indu tion and AP might
work better than either. Thus some, or all, of the AP hypotheses ould be stored for later use. However, it
is not yet lear to the authors whi h strategy would operate best.

A knowledgements
The rst author would like to thank Thirza and Clare for their good-natured support during the writing
of this paper. This work was supported partly by the Esprit Long Term Resear h A tion ILP II (proje t
20237), EPSRC grant GR/K57985 on Experiments with Distribution-based Ma hine Learning.

Referen es
[1 J. Arima. Logi al Foundations of Indu tion and Analogy. PhD thesis, Kyoto University, 1998.
[2 T. Davies and S.J. Russell. A logi al approa h to reasoning by analogy. In IJCAI-87, pages 264{270.
Morgan Kaufmann, 1987.
[3 W. Emde and D. Wetts here k. Relational Instan e-Based Learning. In L. Saitta, editor, Pro eedings of the 13th International Ma hine Learning Conferen e, pages 122{130, Los Altos, 1996. Morgan
Kaufmann.
[4 T.G. Evans. A program for the solution of a lass of geoemtri analogy intellgen e test questions. In
M. Minsky, editor, Semanti Information Pro essing. MIT Press, Cambridge, MA, 1968.
[5 A. Hut hinson. Metri s on terms and lauses. In M. Someren and G. Widmer, editors, Pro eedings of
the Ninth European Conferen e on Ma hine Learning, pages 138{145, Berlin, 1997. Springer.
[6 C.X. Ling. Learning the past tense of english verbs: the symboli pattern asso iators vs. onne tionist
models. Journal of Arti ial Intelligen e Resear h, 1:209{229, 1994.
[7 R.J. Mooney and M.E. Cali . Indu tion of rst-order de ision lists: Results on learning the past tense
of english verbs. Journal of Arti ial Intelligen e Resear h, 3:1{24, 1995.
[8 S. Muggleton. Inverse entailment and Progol. New Generation Computing, 13:245{286, 1995.
[9 S. Muggleton, M.E. Bain, J. Hayes-Mi hie, and D. Mi hie. An experimental omparison of human and
ma hine learning formalisms. In Pro eedings of the Sixth International Workshop on Ma hine Learning,
Los Altos, CA, 1989. Kaufmann.
[10 S. Muggleton and L. De Raedt. Indu tive logi programming: Theory and methods. Journal of Logi
Programming, 19,20:629{679, 1994.
[11 S.H. Nienhuys-Cheng. Distan e between Herbrand interpretations: a measure for approximations to a
target on ept. In N. Lavra and S. Dzeroski, editors, Pro eedings of the Seventh International Workshop
on Indu tive Logi Programming (ILP97), pages 321{226, Berlin, 1997. Springer-Verlag. LNAI 1297.
8

[12 S.H. Nienhuys-Cheng. Distan es and limits on Herbrand interpretations. In C.D. Page, editor, Pro eedings of the Eighth International Conferen e on Indu tive Logi Programming (ILP98), pages 250{260,
Berlin, 1998. Springer. LNAI 1446.
[13 C.S. Peir e. Elements of logi . In C. Hartshorne and P. Weiss, editors, Colle ted Papers of Charles
Sanders Peir e, volume 2. Harvard University Press, Cambridge, MA, 1932.
[14 G. Polya. Indu tion and analogy in mathemati s. In Mathemati s and Plausible Reasoning, volume 1.
Prin eton University Press, Prin eton, 1954.
[15 D.E. Rumelhart and J.L. M Clelland. On learning the past tense of english verbs. In Explorations in
the Mi ro-Stru ture of Cognition Vol. II, pages 216{271. MIT Press, Cambridge, MA, 1986.