Sie sind auf Seite 1von 110

ARTIFICIAL INTELLIGENCE

BCA, 5th Semester


CONTENTS
Chapter 1 Introduction
1.1 Definition of Artificial Intelligence
1.2 The History of AI
1.3 Applications of AI
1.4 Branches of AI
1.5 Some More Concepts
1.6 AI Research
1.7 Pros and Cons of Artificial Intelligence
1.8 Disadvantages of Artificial Intelligence
1.9 AI Techniques
1.10 Practical Impact of AI
1.11 Solving Problems by Searching
1.12 Basic Techniques for Artificial Intelligence
1.13 Robot
Check Your Progress
Chapter 2 Problem Solving
2.1 Problem Solving:
2.2 State Space Search
2.3 Production System
2.4 Control Strategies
2.5 Depth-First and Breadth-First Search
2.6 Depth First Search Algorithm
2.7 Breadth First Search Algorithm
2.8 Depth First Search vs. Breadth First Search
2.9 Depth First with Iterative Deepening
2.10 Using State Space to Represent Reasoning in Logic
2.11 Heuristics and the 8-tile puzzle
2.12 A* Search Algorithm
2.13 Consistent Heuristic
2.14 Consequences of Monotonicty
2.15 Travelling Salesman Problem
2.16 Tic-tac-toe
2.17 Water Jug Problem
2.18 Minimax Search Procedure
Check Your Progress
Chapter 3 Knowledge Representation
3.1 Knowledge Representation
3.2 Characteristics of Knowledge
3.3 Knowledge Representation Technologies
3.4 First Order Logic
3.5 Deductive Systems
3.6 Backward Chaining Method of Conclusion Deduction:
3.7 Modus Ponens
3.8 Unification
3.9 Skolem Normal Form
3.10 Clausal Normal Form
3.11 Resolution
Check Your Progress
3
Chapter 4 Rule-Based System
4.1 Rule-Based Systems
4.2 Components of Rule Based Systems
4.3 Technique of Rule-Based Systems
4.4 Methods of Rule-Based Systems
4.5 Forward vs Backwards Chaining
4.6 Rule-Based Deduction Systems
Check Your Progress
Chapter 5 Structural Knowledge Representation
5.1 Semantic Networks
5.2 Partitioned Networks
5.3 Inheritance
5.4 Multiple Inheritance
5.5 Problems with Semantic Nets
5.6 Knowledge Representation Formalisms
5.7 Well-Formed Formula
5.8 Predicate Logic
5.9 Frame
5.10 Class
5.11 Metaclass
5.12 Is-a Relationship
5.13 Inheritance
Check Your Progress
Chapter 6 Expert System
6.1 Expert Systems and Artificial Intelligence
6.2 Justification of Expert System
6.3 Chaining
6.4 Expert System Architecture
6.5 Expert Systems versus Problem-Solving Systems
6.6 Inference Rule
6.7 Applications of Expert Systems
6.8 Advantages of Expert Systems
6.9 Disadvantages of Expert Systems
6.10 Knowledge Acquisition
6.11 Case Study
Check Your Progress
4
Chapter 1 Introduction
1.1 Definition of Artificial Intelligence:
Artificial Intelligence is a branch of computer science which aims to create intelligent machines.
Artificial Intelligence (AI) is the science and engineering of making intelligent machines, especially
intelligent computer programs.
Today Artificial Intelligence (AI) is a relatively new field; even though some groundwork had been
laid earlier. Artificial Intelligence began in earnest with the emergence of the modern computer
during the 1940s and 1950s.
The term Artificial Intelligence was coined by John McCarthy in 1956. He defined it as "the
science and engineering of making intelligent machines”.
Intelligence
Now, the question arises, what is intelligence? Intelligence is the computational part of the ability
to achieve goals in the world. Varying kinds and degrees of intelligence occur in people, many
animals and some machines too.
Intelligence involves mechanisms, it is not a single thing so that one can ask a yes or no question,
“Is this machine intelligent or not?” If doing a task requires only mechanisms that are well
understood today, computer programs can give very impressive performances on these tasks. Such
programs should be considered “somewhat intelligent”.
Sometimes AI is simulating human intelligence but not always or even usually. On the one hand,
we can learn something about how to make machines solve problems by observing other people or
just by observing our own methods. On the other hand, most work in AI involves studying the
problems the world presents to intelligence rather than studying people or animals. AI researchers
are free to use methods that are not observed in people or that involve much more computing than
people can do.
Now during this all discussion, one more question may arise, do computer programs have IQs?
No, IQ is based on the rates at which intelligence develops in children. It is the ratio of the age at
which a child normally makes a certain score to the child's age. The scale is extended to adults in a
suitable way. IQ correlates well with various measures of success or failure in life, but making
computers that can score high on IQ tests would be weakly correlated with their usefulness. For
example, the ability of a child to repeat back a long sequence of digits correlates well with other
intellectual abilities, perhaps because it measures how much information the child can compute
with at once. However, “digit span” is trivial for even extremely limited computers.
Mr. R. Jensen, a leading researcher in human intelligence, suggests “as a heuristic hypothesis” that
all normal humans have the same intellectual mechanisms and that differences in intelligence are
related to “quantitative biochemical and physiological conditions”. I see them as speed, short term
memory, and the ability to form accurate and retrievable long term memories.
Whether or not Jensen is right about human intelligence, the situation in AI today is the reverse.
Computer programs have plenty of speed and memory but their abilities correspond to the
intellectual mechanisms that program designers understand well enough to put in programs. Some
abilities that children normally don't develop till they are teenagers may be in, and some abilities
possessed by two year olds are still out. The matter is further complicated by the fact that the
cognitive sciences still have not succeeded in determining exactly what the human abilities are.
Whenever people do better than computers on some task or computers use a lot of computation to
do as well as people, this demonstrates that the program designers lack understanding of the
intellectual mechanisms required to do the task efficiently.
AI research started after word war II, a number of people independently started to work on
intelligent machines. The English mathematician Alan Turing may have been the first. He gave a
lecture on it in 1947. He also may have been the first to decide that AI was best researched by
programming computers rather than by building machines. By the late 1950s, there were many
researchers on AI, and most of them were basing their work on programming computers.
1.2 The History of AI
The name “Artificial Intelligence” dates only to the 1950’s, but its roots stretch back thousands of
years, into the earliest studies of the nature of knowledge and reasoning. Intelligent artifacts appear
in Greek mythology; the idea of developing ways to perform reasoning automatically, and efforts to
build automata to perform tasks such as game-playing, date back hundreds of years. Psychologists
have long studied human cognition, helping to build up knowledge about the nature of human
intelligence. Philosophers have analyzed the nature of knowledge, have studied the mind-body
problem of how mental states relate to physical processes, and have explored formal frameworks
for deriving conclusions.
The advent of electronic computers, however, provided a revolutionary advance in the ability to
study intelligence by actually building intelligent artifacts, systems to perform complex reasoning
tasks, and observing and experimenting with their behavior to identify fundamental principles. In
1950, a landmark paper by Alan Turing argued for the possibility of building intelligent computing
systems. That paper proposed an operational test for comparing the intellectual ability of humans
and AI systems, now generally called the “Turing Test”.
Turing Test
Alan Turing (1950) in his article Computing Machinery and Intelligence discussed conditions for
considering a machine to be intelligent. He argued that if the machine could successfully pretend to
be human to a knowledgeable observer then we certainly should consider it intelligent. This test
would satisfy most people but not all philosophers. The observer could interact with the machine
and a human by teletype (to avoid requiring that the machine imitate the appearance or voice of the
person), and the human would try to persuade the observer that it was human and the machine
would try to fool the observer.
The Turing test is a one-sided test. A machine that passes the test should certainly be considered
intelligent, but a machine could still be considered intelligent without knowing enough about
humans to imitate a human.
Daniel Dennett’s book Brainchildren has an excellent discussion of the Turing test and the various
partial Turing tests that have been implemented, i.e. with restrictions on the observer's knowledge
of AI and the subject matter of questioning. It turns out that some people are easily led into
believing that a rather dumb program is intelligent.
AI aims at human-level intelligence. The ultimate effort is to make computer programs that can
solve problems and achieve goals in the world as well as humans. However, many people involved
in particular research areas are much less ambitious.
Computers can be programmed to simulate any kind of machine with intelligence.
Many researchers invented non-computer machines, hoping that they would be intelligent in
different ways than the computer programs could be. However, they usually simulate their invented
machines on a computer and come to doubt that the new machine is worth building. Because many
billions of dollars that have been spent in making computers faster and faster, another kind of
machine would have to be very fast to perform better than a program on a computer simulating the
machine.
The significance of the Turing Test has been controversial. Some have believed that building a
system to pass the Turing Test should be the goal of AI. Others, however, reject the goal of
developing systems to imitate human behavior.
Early AI research rapidly developed systems to perform a wide range of tasks often associated with
intelligence in people, including theorem-proving in geometry, symbolic integration, solving
equations, and even solving analogical reasoning problems of the types sometimes found on human
intelligence tests.
1.3 Applications of AI
1. Game Playing
You can buy machines that can play master level chess. There is some Artificial Intelligence
in them, but they play well against people mainly through brute force computation--looking
at hundreds of thousands of positions. To beat a world champion by brute force and known
reliable heuristics requires being able to look at 200 million positions per second.
2. Natural Language Processing
Just getting a sequence of words into a computer is not enough. Parsing sentences is not
enough either. The computer has to be provided with an understanding of the domain the
text is about, and this is presently possible only for very limited domains.
3. Vision Processing
The world is composed of three-dimensional objects, but the inputs to the human eye and
computers' TV cameras are two dimensional. Some useful programs can work solely in two
dimensions, but full computer vision requires partial three-dimensional information that is
not just a set of two-dimensional views. At present there are only limited ways of
representing three-dimensional information directly, and they are not as good as what
humans evidently use.
4. Speech Recognition
In the 1990s, computer speech recognition reached a practical level for limited purposes.
Thus United Airlines has replaced its keyboard tree for flight information by a system using
speech recognition of flight numbers and city names. It is quite convenient. On the other
hand, while it is possible to instruct some computers using speech, most users have gone
back to the keyboard and the mouse as still more convenient.
5. Expert Systems
A knowledge engineer interviews experts in a certain domain and tries to embody their
knowledge in a computer program for carrying out some task. How well this works depends
on whether the intellectual mechanisms required for the task are within the present state of
AI. When this turned out not to be so, there were many disappointing results. One of the
first expert systems was MYCIN in 1974, which diagnosed bacterial infections of the blood
and suggested treatments. It did better than medical students or practicing doctors, provided
its limitations were observed. Namely, its ontology included bacteria, symptoms, and
treatments and did not include patients, doctors, hospitals, death, recovery, and events
occurring in time. Its interactions depended on a single patient being considered. Since the
experts consulted by the knowledge engineers knew about patients, doctors, death, recovery,
etc., it is clear that the knowledge engineers forced what the experts told them into a
predetermined framework. In the present state of AI, this has to be true. The usefulness of
current expert systems depends on their users having common sense.
6. Heuristic Classification
One of the most feasible kinds of expert system given the present knowledge of AI is to put
some information in one of a fixed set of categories using several sources of information.
An example is advising whether to accept a proposed credit card purchase. Information is
available about the owner of the credit card, his record of payment and also about the item
he is buying and about the establishment from which he is buying it (e.g., about whether
there have been previous credit card frauds at this establishment).
1.4 Branches of AI
1. Logical AI
What a program knows about the world in general the facts of the specific situation in
which it must act, and its goals are all represented by sentences of some mathematical
logical language. The program decides what to do by inferring that certain actions are
appropriate for achieving its goals.
Logic is also used in weaker ways in AI, databases, logic programming, hardware design
and other parts of computer science. Many AI systems represent facts by a limited subset of
logic and use non-logical programs as well as logical inference to make inferences.
Databases often use only ground formulas. Logic programming restricts its representation to
Horn clauses. Hardware design usually involves only propositional logic. These restrictions
are almost always justified by considerations of computational efficiency.
2. Search
AI programs often examine large numbers of possibilities, e.g. moves in a chess game or
inferences by a theorem proving program. Discoveries are continually made about how to
do this more efficiently in various domains.
3. Pattern Recognition
When a program makes observations of some kind, it is often programmed to compare what
it sees with a pattern. For example, a vision program may try to match a pattern of eyes and
a nose in a scene in order to find a face. More complex patterns, e.g. in a natural language
text, in a chess position, or in the history of some event are also studied. These more
complex patterns require quite different methods than do the simple patterns that have been
studied the most.
4. Representation
Facts about the world have to be represented in some way. Usually languages of
mathematical logic are used.
5. Inference
From some facts, others can be inferred. Mathematical logical deduction is adequate for
some purposes, but new methods of non-monotonic inference have been added to logic
since the 1970s. The simplest kind of non-monotonic reasoning is default reasoning in
which a conclusion is to be inferred by default, but the conclusion can be withdrawn if there
is evidence to the contrary. For example, when we hear of a bird, we man infer that it can
fly, but this conclusion can be reversed when we hear that it is a penguin. It is the possibility
that a conclusion may have to be withdrawn that constitutes the non-monotonic character of
the reasoning. Ordinary logical reasoning is monotonic in that the set of conclusions that
can the drawn from a set of premises is a monotonic increasing function of the premises.
Circumscription is another form of non-monotonic reasoning.
6. Common Sense Knowledge and Reasoning
This is the area in which AI is farthest from human-level, in spite of the fact that it has been
an active research area since the 1950s. While there has been considerable progress, e.g. in
developing systems of non-monotonic reasoning and theories of action, yet more new ideas
are needed. The system contains a large but spotty collection of common sense facts.
7. Learning from Experience
Programs do that. The approaches to AI based on connectionism and neural nets specialize
in that. There is also learning of laws expressed in logic. Programs can only learn what facts
or behaviors their formalisms can represent, and unfortunately learning systems are almost
all based on very limited abilities to represent information.
8. Planning
Planning programs start with general facts about the world (especially facts about the effects
of actions), facts about the particular situation and a statement of a goal. From these, they
generate a strategy for achieving the goal. In the most common cases, the strategy is just a
sequence of actions.
9. Epistemology
This is a study of the kinds of knowledge that are required for solving problems in the
world. In philosophy, epistemology is the study of knowledge, its form and limitations. This will
do pretty well for AI also, provided we include in the study common sense knowledge of
the world and scientific knowledge. Both of these offer difficulties philosophers haven't
studied, e.g. they haven't studied in detail what people or machines can know about the
shape of an object the field of view, remembered from previously being in the field of view,
remembered from a description or remembered from having been felt with the hands.
10. Ontology
Ontology is the study of the kinds of things that exist. In AI, the programs and sentences
deal with various kinds of objects, and we study what these kinds are and what their basic
properties are. Emphasis on ontology begins in the 1990s.
11. Heuristics
A heuristic is a way of trying to discover something or an idea imbedded in a program. The
term is used variously in AI. Heuristic functions are used in some approaches to search to
measure how far a node in a search tree seems to be from a goal. Heuristic predicates that
compare two nodes in a search tree to see if one is better than the other, i.e. constitutes an
advance toward the goal, may be more useful.
Most AI work has concerned heuristics, i.e. the algorithms that solve problems, usually
taking for granted a particular epistemology of a particular domain, e.g. the representation
of chess positions.
12. Genetic Programming
Genetic programming is a technique for getting programs to solve a task by mating random
Lisp programs and selecting fittest in millions of generations.

1.5 Some More Concepts


1. Bounded Informatic Situation
Formal theories in the physical sciences deal with a bounded informatic situation. Scientists
decide informally in advance what phenomena to take into account. For example, much
celestial mechanics is done within the Newtonian gravitational theory and does not take into
account possible additional effects such as outgassing from a comet or electromagnetic
forces exerted by the solar wind. If more phenomena are to be considered, scientists must
make a new theories--and of course they do.
Most AI formalisms also work only in a bounded informatic situation. What phenomena to
take into account is decided by a person before the formal theory is constructed. With such
restrictions, much of the reasoning can be monotonic, but such systems cannot reach human
level ability. For that, the machine will have to decide for itself what information is
relevant, and that reasoning will inevitably be partly nonmonotonic.
One example is the “blocks world” where the position of a block x is entirely characterized
by a sentence At(x,l) or On(x,y), where l is a location or y is another block.
Another example is the Mycin expert system in which the ontology (objects considered)
includes diseases, symptoms, and drugs, but not patients (there is only one), doctors or
events occurring in time.
2. Common Sense Knowledge of the World
Humans have a lot of knowledge of the world which cannot be put in the form of precise
theories. Though the information is imprecise, we believe it can still be put in logical form
3. Common Sense Informatic Situation
In general a thinking human is in what we call the common sense informatic situation, as
distinct from the bounded informatic situation. The known facts are necessarily incomplete.
We live in a world of middle-sized object which can only be partly observed. We only
partly know how the objects that can be observed are built from elementary particles in
general, and our information is even more incomplete about the structure of particular
objects. These limitations apply to any buildable machines, so the problem is not just one of
human limitations.
In many actual situations, there is no a priori limitation on what facts are relevant. It may
not even be clear in advance what phenomena should be taken into account. The
consequences of actions cannot be fully determined. The common sense informatic situation
necessitates the use of approximate concepts that cannot be fully defined and the use of
approximate theories involving them. It also requires non-monotonic reasoning in reaching
conclusions. Many AI texts assume that the information situation is bounded; without even
mentioning the assumption explicitly.
4. Epistemologically Adequate Languages
A logical language for use in the common sense informatic situation must be capable of
expressing directly the information actually available to agents. For example, giving the
density and temperature of air and its velocity field and the Navier-Stokes equations does
not practically allow expressing what a person or robot actually can know about the wind
that is blowing. We and robots can talk about its direction, strength and gustiness
approximately, and can give a few of these quantitities numerical values with the aid of
instruments if instruments are available, but we have to deal with the phenomena even when
no numbers can be obtained.
5. Robot
We can generalize the notion of a robot as a system with a variant of the physical
capabilities of a person, including the ability to move around, manipulate objects and
perceive scenes, all controlled by a computer program. More generally, a robot is a
computer-controlled system that can explore and manipulate an environment that is not part
of the robot itself and is, in some important sense, larger than the robot. A robot should
maintain a continued existence and not reset itself to a standard state after each task. From
this point of view, we can have a robot that explores and manipulates the Internet without it
needing legs, hands and eyes.
6. Qualitative Reasoning
This concerns reasoning about physical processes when the numerical relations required for
applying the formulas of physics are not known. Most of the work in the area assumes that
information about what processes to take into account are provided by the user. Systems
that must be given this information often won't do human level qualitative reasoning.
7. Common Sense Physics
Corresponds to people's ability to make decisions involving physical phenomena in daily
life, e.g. deciding that the spill of a cup of hot coffee is likely to burn Mr. A, but Mr. B is far
enough to be safe. It differs from qualitative physics, as studied by most researchers in
qualitative reasoning, in that the system doing the reasoning must itself use common sense
knowledge to decide what phenomena are relevant in the particular case.
8. Expert Systems
These are designed by people, i.e. not by computer programs, to take a limited set of
phenomena into account. Many of them do their reasoning using logic, and others use
formalisms amounting to subsets of first order logic. Many require very little common sense
knowledge and reasoning ability. Restricting expressiveness of the representation of facts is
often done to increase computational efficiency.
9. Elaboration Tolerance
A set of facts described as a logical theory needs to be modifiable by adding sentences
rather than only by going back to natural language and starting over. For example, we can
modify the missionaries and cannibals problem by saying that there is an oar on each bank
of the river and that the boat can be propelled with one oar carrying one person but needs
two oars to carry two people. Some formalizations require complete rewriting to
accomodate this elaboration. Others share with natural language the ability to allow the
elaboration by an addition to what was previously said.
There are degrees of elaboration tolerance. A state space formalization of the missionaries
and cannibals problem in which a state is represented by a triplet (m,c,b) of the numbers of
missionaries, cannibals and boats on the initial bank is less elaboration tolerant than a
situation calculus formalism in which the set of objects present in a situation is not specified
in advance. In particular, the former representation needs surgery to add the oars, whereas
the latter can handle it by adjoining more sentences; as can a person. The realization of
elaboration tolerance requires nonmonotonic reasoning.
10. Robotic Free Will
Robots need to consider their choices and decide which of them leads to the most favorable
situation. In doing this, the robot considers a system in which its own outputs are regarded
as free variables, i.e. it doesn't consider the process by which it is deciding what to do. The
perception of having choices is also what humans consider as free will.
11. Approximate Concepts
Common sense thinking cannot avoid concepts without clear definitions. Consider the
welfare of an animal. Over a period of minutes, the welfare is fairly well defined, but asking
what will benefit a newly hatched chick over the next year is ill defined. The exact snow,
ice and rock that constitutes Mount Everest is ill defined. The key fact about approximate
concepts is that while they are not well defined, sentences involving them may be quite well
defined. For example, the proposition that Mount Everest was first climbed in 1953 is
definite, and its definiteness is not compromised by the ill-definedness of the exact
boundaries of the mountain.
There are two ways of regarding approximate concepts. The first is to suppose that there is a
precise concept, but it is incompletely known. Thus we may suppose that there is a truth of
the matter as to which rocks and ice constitute Mount Everest. If this approach is taken, we
simply need weak axioms telling what we do know but not defining the concept completely.
The second approach is to regard the concept as intrinsically approximate. There is no truth
of the matter. One practical difference is that we would not expect two geographers
independently researching Mount Everest to define the same boundary. They would have to
interact, because the boundaries of Mount Everest are yet to be defined.
12. Approximate Theories
Any theory involving approximate concepts is an approximate theory. We can have a theory
of the welfare of chickens. However, its notions don't make sense if pushed too far. For
example, animal rights people assign some rights to chickens but cannot define them
precisely. It is not presently apparent whether the expression of approximate theories in
mathematical logical languages will require any innovations in mathematical logic.
13. Ambiguity Tolerance
Assertions often turn out to be ambiguous with the ambiguity only being discovered many
years after the assertion was enunciated. For example, it is a priori ambiguous whether the
phrase “conspiring to assault a Federal official” covers the case when the criminals
mistakenly believe their intended victim is a Federal official. An ambiguity in a law does
not invalidate it in the cases where it can be considered unambiguous. Even where it is
formally ambiguous, it is subject to judicial interpretation. AI systems will also require
means of isolating ambiguities and also contradictions. The default rule is that the concept is
not ambiguous in the particular case. The ambiguous theories are a kind of approximate
theory.
14. Causal Reasoning
A major concern of logical AI has been treating the consequences of actions and other
events. The epistemological problem concerns what can be known about the laws that
determine the results of events. A theory of causality is pretty sure to be approximate.
15. Situation Calculus
Situation calculus is the most studied formalism for doing causal reasoning. A situation is in
principle a snapshot of the world at an instant. One never knows a situation--one only
knows facts about a situation. Events occur in situations and give rise to new situations.
There are many variants of situation calculus, and none of them has come to dominate.
16. Fluents
Fluents are functions of situations in situation calculus. The simplest fluents are
propositional and have truth values. There are also fluents with values in numerical or
symbolic domains. Situational fluents take on situations as values.
17. Frame Problem
This is the problem of how to express the facts about the effects of actions and other events
in such a way that it is not necessary to explicitly state for every event, the fluents it does
not affect.
18. Qualification Problem
This concerns how to express the preconditions for actions and other events. That it is
necessary to have a ticket to fly on a commercial airplane is rather unproblematical to
express. That it is necessary to be wearing clothes needs to be kept inexplicit unless it
somehow comes up.
19. Ramification Problem
Events often have other effects than those we are immediately inclined to put in the axioms
concerned with the particular kind of event.
20. Projection
Given information about a situation, and axioms about the effects of actions and other
events, the projection problem is to determine facts about future situations. It is assumed
that no facts are available about future situations other than what can be inferred from the
“known laws of motion” and what is known about the initial situation.
21. Planning
The largest single domain for logical AI has been planning, usually the restricted problem of
finding a finite sequence of actions that will achieve a goal. Planning is somewhat the
inverse problem to projection.
22. Narrative
A narrative tells what happened, but any narrative can only tell a certain amount. A
narrative will usually give facts about the future of a situation that are not just consequences
of projection from an initial situation. While we may suppose that the future is entirely
determined by the initial situation, our knowledge does not permit inferring all the facts
about it by projection. Therefore, narratives give facts about the future beyond what follows
by projection.
23. Understanding
A rather demanding notion is most useful. In particular, fish do not understand swimming,
because they can not use knowledge to improve their swimming, to wish for better fins, or
to teach other fish. Maybe fish do learn to improve their swimming, but this presumably
consists primarily of the adjustment of parameters and isn't usefully called understanding. I
would apply understanding only to some systems that can do hypothetical reasoning;
if p were true, then q would be true. Thus Fortran compilers don't understand Fortran.
24. Consciousness, awareness and introspection
Human level AI systems will require these qualities in order to do tasks we assign them. In
order to decide how well it is doing, a robot will need to be able to examine its goal
structure and the structure of its beliefs from the outside.
25. Mental situation calculus
The idea is that there are mental situations, mental fluents and mental events that give rise to
new mental situations. The mental events include observations and inferences but also the
results of observing the mental situation up to the current time. This allows drawing the
conclusion that there isn't yet information needed to solve a certain problem, and therefore
more information must be sought outside the robot or organism.
26. Discrete processes
Causal reasoning is simplest when applied to processes in which discrete events occur and
have definite results. In situation calculus, the formulas s' = result(e,s) gives the new
situation s' that results when the event e occurs in situation s. Many continuous processes
that occur in human or robot activity can have approximate theories that are discrete.
27. Continuous Processes
Humans approximate continuous processes with representations that are as discrete as
possible. For example, “Junior read a book while on the airplane from Glasgow to London”.
Continuous processes can be treated in the situation calculus, but the theory is so far less
successful than in discrete cases. We also sometimes approximate discrete processes by
continuous ones.
28. Non-deterministic events
Situation calculus and other causal formalisms are harder to use when the effects of an
action are indefinite. Often result(e,s) is not usefully axiomatizable and something like
occurs(e,s) must be used.
29. Conjunctivity
It often happens that two phenomena are independent. In that case, we may form a
description of their combination by taking the conjunction of the descriptions of the separate
phenomena. The description language satisfies conjunctivity if the conclusions we can draw
about one of the phenomena from the combined description are the same as the
conjunctions we could draw from the single description. For example, we may have
separate descriptions of the assassination of Abraham Lincoln and of Mendel's
contemporaneous experiments with peas. What we can infer about Mendel's experiments
from the conjunction should ordinarily be the same as what we can infer from just the
description of Mendel's experiments. Many formalisms for concurrent events don't have this
property, but conjunctivity itself is applicable to more than concurrent events.
To use logician's language, the conjunction of the two theories should be a conservative
extension of each of the theories. Actually, we may settle for less. We only require that the
inferrable sentences about Mendel (or about Lincoln) in the conjunction are the same. The
combined theory may admit inferring other sentences in the language of the separate theory
that weren't inferrable in the separate theories.
30. Learning
Making computers learn presents two problems--epistemological and heuristic. The
epistemological problem is to define the space of concepts that the program can learn. The
heuristic problem is the actual learning algorithm. The heuristic problem of algorithms for
learning has been much studied and the epistemological mostly ignored. The designer of the
learning system makes the program operate with a fixed and limited set of concepts.
Learning programs will never reach human level of generality as long as this approach is
followed. To learn many important concepts, it must have more than a set of weights.
31. Discrimination, Recognition and Description
Discrimination is the deciding which category a stimulus belongs to among a fixed set of
categories, e.g. decide which letter of the alphabet is depicted in an image. Recognition
involves deciding whether a stimulus belongs to the same set, i.e. represents the same
object, e.g. a person, as a previously seen stimulus. Description involves describing an
object in detail appropriate to performing some action with it, e.g. picking it up by the
handle or some other designated part. Description is the most ambitious of these operations
and has been the forte of logic-based approaches.
32. Logic Programming
Logic programming isolates a subdomain of first order logic that has nice computational
properties. When the facts are described as a logic program, problems can often be solved
by a standard program, e.g. a Prolog interpreter, using these facts as a program.
Unfortunately, in general the facts about a domain and the problems we would like
computers to solve have that form only in special cases.
33. Rich and Poor Entities
A rich entity is one about which a person or machine can never learn all the facts. The state
of the reader's body is a rich entity. The actual history of my going home this evening is a
rich entity, e.g. it includes the exact position of my body on foot and in the car at each
moment. While a system can never fully describe a rich entity, it can learn facts about it and
represent them by logical sentences.
Poor entities occur in plans and formal theories and in accounts of situations and events and
can be fully prescribed. For example, my plan for going home this evening is a poor entity,
since it does not contain more than a small, fixed amount of detail. Rich entities are often
approximated by poor entities. Indeed some rich entities may be regarded as inverse limits
of trees of poor entities. (The mathematical notion of inverse limit may or may not turn out
to be useful, although I wouldn't advise anyone to study the subject quite yet just for its
possible AI applications.)
34. Non-monotonic Reasoning
Both humans and machines must draw conclusions that are true in the best models of the
facts being taken into account. Several concepts of best are used in different systems. Many
are based on minimizing something. When new facts are added, some of the previous
conclusions may no longer hold. This is why the reasoning that reached these conclusions is
called nonmonotonic.
35. Probabilistic Reasoning
Probabilistic reasoning is a kind of non-monotonic reasoning. If the probability of one
sentence is changed, say given the value 1, other sentences that previously had high
probability may now have low or even 0 probability. Setting up the probabilistic models, i.e
defining the sample space of ‘events’ to which probabilities are to be given often involves
more general nonmonotonic reasoning, but this is conventionally done by a person
informally rather than by a computer.
36. Creativity
Humans are sometimes creative--perhaps rarely in the life of an individual and among
people. What is creativity? We consider creativity as an aspect of the solution to a problem
rather than as attribute of a person (or computer program).
A creative solution to a problem contains a concept not present in the functions and
predicates in terms of which the problem is posed.
1.6 AI Research
AI research has both theoretical and experimental sides. The experimental side has both basic and
applied aspects.
There are two main lines of research. One is biological, based on the idea that since humans are
intelligent, AI should study humans and imitate their psychology or physiology. The other is
phenomenal, based on studying and formalizing common sense facts about the world and the
problems that the world presents to the achievement of goals. The two approaches interact to some
extent, and both should eventually succeed. It is a race, but both racers seem to be walking.
AI is some how related to logic programming. Logic programming provides useful programming
languages (mainly Prolog).
Beyond that, sometimes a theory useful in AI can be expressed as a collection of Horn clauses, and
goal to be achieved can be expressed as that of finding values of variables satisfying an expression.
The problem can sometimes be solved by running the Prolog program consisting of and.
There are two possible obstacles to regarding AI as logic programming. First, Horn theories do not
exhaust first order logic. Second, the Prolog program expressing the theory may be extremely
inefficient. More elaborate control than just executing the program that expresses the theory is often
needed. Map coloring provides examples.
Followings are some organizations and publications that are concerned with AI research:
The American Association for Artificial Intelligence (AAAI)
The European Coordinating Committee for Artificial Intelligence (ECCAI)
The Society for Artificial Intelligence and Simulation of Behavior (AISB) are scientific societies
concerned with AI research.
The Association for Computing Machinery (ACM) has a special interest group on artificial
intelligence SIGART.
The International Joint Conference on AI (IJCAI) is the main international conference. The AAAI
runs a US National Conference on AI. Electronic Transactions on Artificial Intelligence, Artificial
Intelligence, and Journal of Artificial Intelligence Research, and IEEE Transactions on Pattern
Analysis and Machine Intelligence are four of the main journals publishing AI research papers. I
have not yet found everything that should be in this paragraph.
17
1.7 Pros and Cons of Artificial Intelligence
Artificial intelligence (AI) is the intelligence of machines. It is about designing machines that can
think. Researchers aim at introducing an emotional aspect into machines. How can it affect our
lives?
How fast is technology changing? According to many experts, faster than the majority of us think
or are prepared for. According to Ray Kurzweil, a futurist, “we will have both the hardware and the
software to achieve human level artificial intelligence with the broad suppleness of human
intelligence including our emotional intelligence by 2029.” Mr. Kurzweil says not to worry; such
super machines will also have morals and respect us as their creators. He also believes that humans
themselves will be smarter, healthier, and more capable in the near future by merging with our
technology. For example, tiny robots implanted in our brains will work directly with our neurons to
make us smarter.
Will such a technological revolution take place? Some would argue that it is inevitable, or that it is
already happening. It is hard to deny the tremendous changes that most of us have seen in our own
lifetimes. Even people in their twenties probably remember a time before cell phones and the
internet. Seventy years ago there was no television, much less satellites and cable. People listened
to phonographs or the radio, if they had electricity. A little over a hundred years ago there were no
cars. If you wanted to go to town, you saddled up your horse, or hitched him to a wagon.
Some futurists, like Mr. Kurzweil, believe that technological progress is a logarithmic progression,
rather than a linear one. In other words, the changes are coming more rapidly all of the time. They
see this as leading inevitably to what has been described as the technological singularity. As the
term is used by some, this is a hypothesized point in the future that will be characterized by the
development of self improving machines. The idea is that if machines can be made capable of
improving themselves, they will build even smarter machines, which in turn will build smarter
machines, and so forth, rapidly outpacing us. The mathematician and novelist Vernor Vinge put it
as, "When greater-than-human intelligence drives progress, that progress will be much more rapid.
In fact, there seems no reason why progress itself would not involve the creation of still more
intelligent entities — on a still-shorter time scale." He is not as positive as Ray Kurzweil about
what this will mean for human civilization. When first writing about the subject he made a
statement that is often quoted, "Within thirty years, we will have the technological means to create
superhuman intelligence. Shortly thereafter, the human era will be ended." Other experts likewise
feel that the creation of such super machines will eventually result in the annihilation of the human
race, either deliberately or by accident.
Technology is neither good nor bad. It never has been. What man does with it is another story
entirely. Technological changes are certainly coming. They are already taking place. They are
constant and ubiquitous. Many believe that they are accelerating. They are probably also
unstoppable. Just as with the scientific knowledge that went into making the atomic bomb, once it
is possible to do something, someone will eventually do it.
The question then is how soon the next big breakthrough will come and what we will do with it or
it to us.
1.8 Disadvantages of Artificial Intelligence
Artificial intelligence is not proved purely advantageous. There are some disadvantages of artificial
intelligence. Artificial intelligence is the science and engineering of designing intelligent machines. The
introduction of an artificial intelligence into machines will enable them to think. It might make
them capable of learning and reasoning. There are bright chances that intelligent machines will be
able to perform critical tasks. We might be able to employ them for dangerous missions, thus
minimizing the risk to human life. But there is another side to artificial intelligence. Let’s look at it.
If robots start replacing human resources in every field, we will have to deal with serious issues like
unemployment in turn leading to mental depression, poverty and crime in the society. Human
beings deprived of their work life may not find any means to channelize their energies and harness
their expertise. Human beings will be left with empty time.
Secondly, intelligent machines may not be the right choice for customer service. Replacing human
beings with robots in every field may not be a right decision to make. There are many jobs that
require the human touch. Intelligent machines will surely not be able to substitute for the caring
behavior of hospital nurses or the promising voice of a doctor.
One of the major disadvantages of intelligent machines is that they cannot be ‘human’. We might
be able to make them think. But will we be able to make them feel? Intelligent machines will
definitely be able to work for long hours. But will they do it with dedication? Will they work with
devotion? How will intelligent machines work whole heartedly when they don’t have a heart?
Apart from these concerns, there are chances that intelligent machines overpower human beings.
Machines may enslave human beings and start ruling the world. Imagine artificial intelligence
taking over human intellect! The picture is definitely horrible.
Some thinkers consider it ethically wrong to create artificial intelligent machines. According to
them, intelligence is God’s gift to mankind. It is not correct to even try to recreate intelligence. It is
against ethics to create replicas of human beings. Don’t you also think so?
1.9 AI Techniques
1.9.1 Search
Search is a problem-solving technique that systematically explores a space of problem states, i.e.,
successive and alternative stages in the problem-solving process. Examples of problem states might
include the different board configurations in a game or intermediate steps in a reasoning process.
This space of alternative solutions is then searched to find an answer. Newell and Simon (1976)
have argued that, this is the essential basis of human problem solving. Indeed, when a chess player
examines the effects of different moves or a doctor considers a number of alternative diagnoses,
they are searching among alternatives.
In 1976, Newell and Simon proposed that intelligent behavior arises from the manipulation of
symbols, entities that represent other entities, and that the process by which intelligence arises is
heuristic search. Search is a process of formulating and examining alternatives. It starts with an
initial state, a set of candidate actions, and criteria for identifying the goal state. It is often guided
by heuristics, or “rules of thumb”, which are generally useful, but not guaranteed to make the best
choices. Starting from the initial state, the search process selects actions to transform that state into
new states, which themselves are transformed into more new states, until a goal state is generated.
For example, consider a search program to solve the “8-puzzle”. A child solves the puzzle by
sliding the numbered tiles (without lifting them) to reach a configuration in which the tiles are all in
19
numerical order. When the 8 puzzle is seen as a search problem, the initial state is a starting board
position, each action is a possible move of one tile up, down, left, or right (when the position it will
move to is blank), and the goal state is the second state in the following figure. Here a heuristic
function might suggest candidate moves by comparing their results to the goal, in order to favor
those moves that appear to be making progress towards the solution.
Initial State Goal State
1.9.2 Knowledge Capture and Representation
In order to guide search or even to describe problems, actions, and solutions, the relevant domain
knowledge must be encoded in a form that can be effectively manipulated by a program. More
generally, the usefulness of any reasoning process depends not only on the reasoning process itself,
but also on having the right knowledge and representing it in a form the program can use.
In the logical approach to knowledge representation and reasoning, information is encoded as
assertions in logic, and the system draws conclusions by deduction from those assertions. Other
research studies non-deductive forms of reasoning, such as reasoning by analogy and abductive
inference--the process of inferring the best explanation for a set of facts. Abductive inference does
not guarantee sound conclusions, but is enormously useful for tasks such as medical diagnosis, in
which a reasoner must hypothesize causes for a set of symptoms.
Capturing the knowledge needed by AI systems is a challenging task. The knowledge in rule-based
expert systems is represented in the form of rules listing conditions to check for, and conclusions to
be drawn if those conditions are satisfied. For example, a rule might state that IF certain conditions
hold (e.g., the patient has certain symptoms), THE+ certain conclusions should be drawn (e.g., that
the patient has a particular condition or disease). A natural way to generate these rules is to
interview experts. Unfortunately, the experts may not be able to adequately explain their decisions
in a rule-based way, resulting in a “knowledge-acquisition bottleneck” impeding system
development.
1.9.3 Planning, Vision, and Robotics
The conclusions of the reasoning process can determine goals to be achieved. Planning addresses
the question of how to determine a sequence of actions to achieve those goals. The resulting action
sequences may be designed to be applied in many ways, such as by robots in the world, by
intelligent agents on the Internet, or even by humans. Planning systems may use a number of
20
techniques to make the planning process practical, such as hierarchical planning, reasoning first at
higher levels of abstraction and then elaborating details within the high-level framework (e.g., as a
person might do when first outlining general plans for a trip, and then considering fine-grained
details such as how to get to the airport), and partial-order planning, enabling actions to be inserted
in the plan in any order, rather than chronologically, and sub plans to be merged. Dean and
Kambhampati (1997) provide an extensive survey of this area.
In real-world situations, it is seldom possible to generate a complete plan in advance and then
execute it without changes. The state of the world may be imperfectly-known, the effects of actions
may be uncertain, the world may change while the plan is being generated or executed, and the plan
may require the coordination of multiple cooperating agents, or counter planning to neutralize the
interference of agents with opposing goals. Determining the state of the world and guiding action
requires the ability to gather information about the world, though sensors such as sonar or cameras,
and to interpret that information to draw conclusions. In addition, carrying out actions in a messy
and changing world may require rapid responses to important events (e.g., for a robot-guided
vehicle to correct a skid), or an ongoing process of rapidly selecting actions based on the current
context (for example, when a basketball player must avoid an opponent). Such problems have led to
research on reactive planning, as well as on how to integrate reactive methods with the deliberative
methods providing long-term guidance.
1.9.4 Natural Language Processing
Achieving natural interactions between humans and machines requires machines to understand and
generate language. Likewise, understanding human communication requires the understanding of
how language is processed by people. The nature of human language raises many challenging
issues for language processing systems: natural language is elliptic, leaving much unstated, and its
meaning is context-dependent (“Mary took aspirin” will have a different meaning when explaining
how she recovered from her headache). Some natural language processing approaches investigate
algorithms for syntactic parsing, to determine the grammatical structure of textual passages; others
take a cognitively-inspired view, studying the knowledge structures underlying human
understanding and modeling the process by which they are applied, or even attempting to directly
apply expectations from memory to the parsing process. Other systems apply statistical methods to
tasks such as information extraction from newspaper articles. Machine translation systems, though
still far from replacing human translators for literature, can now generate useful translations.
1.10 Practical Impact of AI
AI technology has had broad impact. AI components are embedded in numerous devices, such as
copy machines that combine case-based reasoning and fuzzy reasoning to automatically adjust the
copier to maintain copy quality. AI systems are also in everyday use for tasks such as identifying
credit card fraud, configuring products, aiding complex planning tasks, and advising physicians. AI
is also playing an increasing role in corporate knowledge management, facilitating the capture and
reuse of expert knowledge. Intelligent tutoring systems make it possible to provide students with
more personalized attention and even for the computer to listen to what children say and respond to
it. Cognitive models developed by AI can also suggest principles for effective support for human
learning, guiding the design of educational systems.
AI technology is being used in autonomous agents that independently monitor their surroundings,
make decisions and act to achieve their goals without human intervention. For example, in space
exploration, the lag times for communications between earth and probes make it essential for
robotic space probes to be able to perform their own decision-making--Depending on the relative
21
locations of the earth and Mars, one-way communication can take over 20 minutes. In a 1999
experiment, an AI system was given primary control of a spacecraft, NASA's Deep Space 1,
60,000,000 miles from earth, as a step towards autonomous robotic exploration of space. Methods
from autonomous systems also promise to provide important technologies to aid humans. For
example, in a 1996 experiment called “No Hands Across America”, the RALPH system, a visionbased
adaptive system to learn road features, was used to drive a vehicle for 98 percent of a trip
from Washington, D.C., to San Diego, maintaining an average speed of 63 mph in daytime, dusk
and night driving conditions. Such systems could be used not only for autonomous vehicles, but
also for safety systems to warn drivers if their vehicles deviate from a safe path.
In electronic commerce, AI is providing methods for determining which products buyers want and
configuring them to suit buyers' needs. The explosive growth of the internet has also led to growing
interest in internet agents to monitor users' tasks, seek needed information, and learn which
information is most useful. For example, the Watson system monitors users as they perform tasks
using standard software tools such as word processors, and uses the task context to focus search for
useful information to provide to them as they work.
Continuing investigation of fundamental aspects of intelligence promises broad impact as well. For
example, researchers are studying the nature of creativity and how to achieve creative computer
systems, providing strong arguments that creativity can be realized by artificial systems. Numerous
programs have been developed for tasks that would be considered creative in humans, such as
discovering interesting mathematical concepts, in the program AM, making paintings, in Aaron,
and performing creative explanation, in SWALE. The task of AM, for example, was not to prove
mathematical theorems, but to discover interesting concepts. The program was provided only with
basic background knowledge from number theory (e.g., the definition of sets), and with heuristics
for revising existing concepts and selecting promising concepts to explore. Starting from this
knowledge, it discovered fundamental concepts such as addition, multiplication, and prime
numbers. It even rediscovered a famous mathematical conjecture that was not known to its
programmer: Goldbach's conjecture, the conjecture that every even integer greater than 2 can be
written as the sum of two primes. Buchanan surveys some significant projects in machine creativity
and argues for its potential impact on the future of artificial intelligence.
In addition, throughout the history of AI, AI research has provided a wellspring of contributions to
computer science in general. For example, the computer language Lisp, developed by John
McCarthy in 1958, provided a tool for developing early AI systems using symbolic computation,
but has remained in use to the present day, both within and outside AI, and has had significant
influence on the area of programming languages. Later AI research also gave rise to the computer
language, Prolog, used for logic programming. A key idea of logic programming is that the
programmer should specify only the problem to be solved and constraints on its solution, leaving
the system itself to determine the details of how the solution should be obtained.
1.11 Solving Problems by Searching
Problem-Solving Agents
Problem-solving agent decides what to do by finding sequences of actions that lead to desirable
states.
How can an agent formulate an appropriate view of the problem it faces?
Intelligent agents are supposed to act in such a way that the environment goes through a sequence
of states that maximizes the performance measure.
22
A search algorithm takes a problem as input and returns a solution in the form of an action
sequence. Once a solution is found, the actions it recommends can be carried out. This is called the
execution phase.
Formulating Problems
Formulating problems is an art. First, we look at the different amounts of knowledge that an agent
can have concerning its actions and the state that it is in. This depends on how the agent is
connected to its environment.
There are four essentially different types of problems.
single state
multiple state
contingency
exploration
Knowledge and Problem Types
Let us consider the vacuum world - we need to clean the world using a vacuum cleaner. For the
moment we will simplify it even further and suppose that the world has just two locations. In this
case there are eight possible world states. There are three possible actions: left, right, and suck. The
goal is to clean up all the dirt, i.e., the goal is equivalent to the set of states {7, 8}.
23
First, suppose that the agent's sensors give it enough information to tell exactly which state it is in
(i.e., the world is completely accessible) and suppose it knows exactly what each of its actions
does.
Then it can calculate exactly what state it will be in after any sequence of actions. For example, if
the initial state is 5, then it can calculate the result of the actions sequence {right, suck}.
This simplest case is called a single-state problem.
Now suppose the agent knows all of the effects of its actions, but world access is limited. For
example, in the extreme case, the agent has no sensors so it knows only that its initial state is one of
the set {1, 2, 3, 4, 5, 6, 7, 8}. In this simple world, the agent can succeed even though it has no
sensors. It knows that the action {right} will cause it to be in one of the states {2, 4, 6, 8}. In fact, it
can discover that the sequence {right, suck, left, suck} is guaranteed to reach a goal state no matter
what the initial state is.
In this case, when the world is not fully accessible, the agent must reason about sets of states that it
might get to, rather than single states. This is called the multiple-state problem.
The case of ignorance about the effects of actions can be treated similarly.
Example Suppose the suck action sometimes deposits dirt when there is none. Then if the agent is
in state 4, sucking will place it in one of {2, 4}. There is still a sequence of actions that is
guaranteed to reach the goal state.
However, sometimes ignorance prevents the agent from finding a guaranteed solution sequence.
Suppose that the agent has the nondeterministic suck action as above and that it has a position
sensor and a local dirt sensor. Suppose the agent is in one of the states {1, 3}. The agent might
formulate the action sequence {suck, right, suck}. The first action would change the state to one of
{5, 7}; moving right would then change the state to {6, 8}. If the agent is, in fact, in state 6, the
plan will succeed, otherwise it will fail. It turns out there is no fixed action sequence that guarantees
a solution to this problem.
The agent can solve the problem if it can perform sensing actions during execution. For example,
starting from one of {1, 3}: first suck dirt, then move right, then suck only if there is dirt there. In
this case the agent must calculate a whole tree of actions rather than a single sequence, i.e., plans
now have conditionals in them that are based on the results of sensing actions.
For this reason, we call this a “contingency problem”. Many problems in the real world are
contingency problems. This is why most people keep their eyes open when walking and driving
around.
Single-state and multiple-state problems can be handled by similar search techniques. Contingency
problems require more complex algorithms. They also lend them selves to an agent design in which
planning and execution are interleaved.
Well-defined problems and solutions
We have seen that the basic elements of a problem definition are the states and actions. To capture
these ideas more precisely, we need the following:
The initial state that the agent knows itself to be in.
24
The set of possible actions available to the agent. The term operator is used to denote the
description of an action in terms of which state will be reached by carrying out the action in
a particular state.
Together these define the state space of the problem: the set of all states reachable from the initial
state by any sequence of actions.
A path in the state space is simply any sequence of actions leading from one state to another.
The goal test, which the agent can apply to a single state to determine if it is a goal state.
Sometimes there is an explicit set of goal states and the test is to see if a state is in this set.
Sometimes a goal is specified by an abstract property, e.g., "checkmate".
It may also be the case that one solution is preferable to another, even though they both reach the
goal. To capture this idea, we use the notion of path cost.
A path cost function is a function that assigns a cost to a path. We will consider the cost of a
path is the sum of the cost of each step. The path function is denoted g.
Together the initial state, operator set, goal test, and path cost function define a problem.
To deal with multiple-state problems, a problem consists of an initial state set; a set of operators
specifying for each action the set of states reached from any given state; and a goal test and path
cost function. The state space is replaced by the state set space.
Measuring Problem-Solving Performance
The effectiveness of a search can be measured in at least three ways:
Does it find a solution?
Is it a good solution (low cost)?
What is the time and memory required to find a solution (search cost)?
The total cost of a search is the sum of the path cost and search cost.
What makes problem solving an art is deciding what goes into the description of states and
operators and what should be left out. The process of removing detail is called abstraction. Without
it, intelligent agents would be swamped.
Example Problems
We will take both; the toy problems and real-world problems. Toy problems are used to illustrate
and exercise various techniques. Real-world problems are usually much harder and we usually care
about the solutions.
Toy problems: The 8-puzzle problem
25
One important trick is to notice that rather than use operators such as "move the 3 tile into the blank
space," it is more sensible to have operators such as "the blank space changes places with the tile to
its left." This is because there are fewer of the latter kind of operator. This leads to the following
formulation:
States: a state description specifies the location of each of the eight tiles in one of the nine
squares. For efficiency it is also useful to include a location for the blank.
Operators: blank moves left, right, up or down.
goal test: as in figure
path cost: length of path
Searching for Solutions
The idea behind most search techniques is to maintain and extend a set of partial solution paths.
To solve the route planning example problem, the first step is to test if the current state is the goal
state. If not, we must consider some other states. This is done by applying the operators to the
current state, thereby generating a new set of states. This process is called expanding the state.
Whenever there are multiple possibilities, a choice must be made about which one to consider
further.
This is the essence of search - choosing one option and putting the others aside for later, in case the
first choice does not lead to a solution. We continue choosing, testing, and expanding until a
solution is found or until there are no more states that can be expanded. The choice of which state
to expand first is determined by the search strategy.
It is helpful to think of the process as building a search tree that is superimposed over the state
space. The root of the tree is a search node corresponding to the initial state. The leaves of the tree
are nodes that do not have successors either because they are goal states or because no operators
can be applied to them.
Search Strategies
We will evaluate search strategies in terms of four criteria:
26
Completeness: is the strategy guaranteed to find a solution when there is one?
Time complexity: how long does it take to find a solution?
Space Complexity: how much memory is required to perform the search?
Optimality: does the search strategy find the highest quality solution when there are
multiple solutions?
Uninformed (or blind) search strategies have no information about the number of steps or the path
cost from the current state to the goal.
In a route finding problem, given several choices of cities to go to next, uniformed search strategies
have no way to prefer any particular choices.
Informed (or heuristic) search strategies use considerations to prefer choices. For example, in the
route finding problem with a map, if a choice is in the direction of the goal city, prefer it.
Even though uninformed search is less effective than heuristic search, uninformed search is still
important because there are many problems for which information used to make informed choices
is not available.
We now present six uninformed search strategies:
Breadth-first search
In this strategy, the root node is expanded first, and then all of the nodes generated by the root are
expanded before any of their successors. Then these successors are all expanded before any of their
successors.
Breadth-first search is complete and optimal provided the path cost is a non-decreasing function of
the depth of the node.
Uniform cost search (Dijkstra's Algorithm)
This strategy modifies breadth-first search to account for general path cost. Now the lowest cost
node on the fringe is always the first one expanded (recall that the cost of a node is the cost of the
path to that node).
When certain conditions are met, the first solution found is guaranteed to be the cheapest.
We consider an example
27
Uniform cost search finds the cheapest solution provided the cost of a path never decreases as it
gets longer, i.e., g (SUCCESSOR (n)>=g (n) for every node n.
Depth-First Search
This search strategy always expands one node to the deepest level of the tree. Only when a deadend
is encountered does the search backup and expand nodes at shallower levels.
28
For problems that have a lot of solutions, depth-first search is often faster than breadth-first search
because it has a good chance of finding a solution after exploring only a small portion of the search
space.
It is common to implement depth-first search with a recursive function that calls itself on each of its
children in turn. In this case, the queue is stored implicitly in the local state of each invocation on
the calling stack.
Iterative Deepening Search
The hard part about depth-limited search is picking the right depth limit. Most of the time we will
not know what to pick for a depth limit. Iterative deepening search is a strategy that side-steps the
issue of choosing the depth limits by trying all possible limits in increasing order.
In effect, this search strategy keeps the best features of both breadth-first and depth-first search. It
has the modest memory requirements of depth-first search but is optimal and complete like breadthfirst
search. Also it does not get stuck in dead ends.
Iterative deepening may seem wasteful because many states are expanded multiple times. For most
problems, however, the overhead of this multiple expansion is actually rather small because a
majority of the nodes are at the bottom of the tree.
In terms of complexity numbers, iterative deepening has the same asymptotic complexity as
breadth-first, but has space complexity O (bd).
In general, iterative deepening is the preferred search method when there is a large search space and
the depth of the solution is unknown.
Avoiding Repeated States
29
We have so far ignored one of the most important complications to the search process: the
possibility of wasting time by expanding states that have already been encountered. For many
problems, this is unavoidable. Any problem where operators are reversible has this problem. For
example in route planning problems, the search spaces are infinite, but if we prune some of the
repeated states, we can cut the tree down to a finite size.
Even when the tree is finite, avoiding repeated states can yield exponential speedups. The classic
example is a space that contains only m+1 state. The tree contains every possible path which is 2m
branches.
There are three ways to deal with repeated states, in increasing order of effectiveness and
complexity:
Do not return to the state you just came from.
Do not create paths with cycles in them.
Do not generate any state that was ever generated before.
To implement this last option, search algorithms often make use of a hash table that stores all the
nodes that are generated. This makes checking for repeated nodes fairly efficient. Whether or not to
do this depends on how "loopy" the state space is.
1.12 Basic Techniques for Artificial Intelligence
Artificial Intelligence (AI) could be defined as the ability of computer software and hardware to do
those things that we, as humans, recognize as intelligent behavior.
Artificial Intelligence is an umbrella term that covers:
A range of solutions for games / classic problems whose representativeness has stimulated
research sufficiently successful so that your game / application have widely. For a certain
amount of customization, always a little craft, you should be able to make a convincing AI
is a challenge for a human player.
A range of tracks for games / problems with the representativeness stimulates further
research active. Taken individually each track is a step forward. However, these tracks can
be mutually exclusive in their design choice so it is impossible to form a complete and
optimized for play / problem concerned. In this unfavorable situation, even the best
technology possible compromise could not put up any resistance against a strong human
player experienced enough.
30
AI technologies extend from the word Technology which stems from the Greek word technos,
which means "art" and "skill."
A sophisticated technology is then a cumulative building of learned and well-refined skills and
processes. In the AI area, these processes have manifested themselves in a number of wellrecognized
and maturing areas including Neural Networks, Expert Systems, Automatic Speech
Recognition, Genetic Algorithms, Intelligent Agents, Natural Language Processing, Robotics,
Logic Programming, and Fuzzy Logic.
Each of these areas will be examined in some depth here, but it is first important to understand that
the importance of these individual areas has changed over the last two decades. These changes have
been based upon the progress in each area, and the needs that each area meets. For example in the
early 1980’s robotics was a large thrust in artificial intelligence. At that time benefits could be seen
in manufacturing applications. In the late 1990’s the blossoming of the Internet pushed the
importance of intelligent agents forward for performing routine tasks and complex searches. At the
same time, throughout the 1980s and 1990s, orders of magnitude advances in computer processing
power have allowed hurdles in speech recognition and image processing to be overcome.
The maturity of each of these technology areas also differs. Expert Systems and Automatic Speech
Recognition are among the most mature while Natural Language Processing and Intelligent Agents
remain in early stages of development. In the next few paragraphs the basis for each of these
technologies will be reviewed. In addition examples where the technologies have been effectively
utilized will be presented.
Expert Systems
These systems are usually built using large sets of “rules.” An expert, who has developed them
mentally after perhaps a decade or more of practice in a specialty area, establishes these rules. A
specialist, known as a knowledge engineer, extracts the rules from the expert and programs them
into a computer. An example of a small rule set is as follows:
IF the relief pressure valve is less than .25 open and the pressure setting is greater than 160 kg,
THEN pressure-category is high.
IF then temperature is less than 250 centigrade,
THEN category is normal,
ELSE temperature-category is hot.
IF pressure-category is high and temperature-category is hot,
THEN send operator alert.
Expert Systems are established for processes where there is a need;
1) for a narrow area of expertise to be more widely known, or
2) to allow sophisticated processes to be run without human intervention.
A classic example of the former is a need for understanding and interpreting the rules of code and
regulations set forth by the U.S. Internal Revenue Service. To provide benefit to the average
citizen preparing their taxes, these rules are programmed into popular software packages such as
Tax Cut and TurboTax. A classic case of the second need is a back end system programmed by the
Credit Card Division of the American Express Company. This system uses sophisticated rules to
determine whether a credit transaction should be approved, denied, or be interrupted by human
intervention.
31
Automatic Speech Recognition (ASR)
This technology takes the sound waves produced by our speech and converts them into text content.
The process, made possible by lots of computer memory and fast processors, works like this:
First your continuous voice sound waves, captured by a microphone, are fed into a digital
converter.
This converter takes many samples (like capturing a snapshot) at a very high rate, e.g.
20,000 times per second.
These samples are compared against a large stored template of sounds which match specific
text. The computer then outputs the text which most closely matches the template.
For this process to work, the system must first be trained to recognize your specific voice. It does
this by asking you to speak a series of simple phonemes, or parts of speech. After about 30 minutes
of training, the system can then begin to derive complex speech patterns from the simple ones you
have provided. The methods used to derive and select from the large numbers of patterns are quite
sophisticated. They borrow from techniques such as Markov Processes, which used probabilities to
determine what the most likely next syllable or word may be, or from Neural Networks. The ASR
technology would likely be of use to anyone who needs to utilize a computer. However it is most
beneficial to those who find it difficult to use a keyboard, such as people suffering from Carpal
Tunnel Syndrome or those who need hands free access on a manufacturing line.
Intelligent Agents
Intelligent agents (IA), now often known as "bots", are software technology that performs difficult
or repetitive tasks for a user. Using direct commands or on a scheduled timetable, the Intelligent
Agents execute a provided list of instructions known as a script. The intelligent agent technology
typically “borrows” from capability inherent in other AI techniques, especially in the area of
search. The IA capability can then add the “ever diligent” capability provided to us by computer
processors that can stay awake and work 24 hours a day 7 days a week. A simple script follows that
checks for stories about the stock market and then notifies the user via e-mail when one of interest
appears.
When HOUR = (11:00 or 13:00 or 15:00 or 16:30)
Start ieexplorer.exe
Load URL = http://www.msn.com
Search site for text = "DJIA"
If present THEN
Start outlook.exe
Address = saunders@ndu.edu
Subject = "Story on Dow Jones Industrial Average"
Body = "At HOUR There was a story on the DJIA posted on the MSN Web site. Click here to
retrieve it"
Else END
Scripts that are well crafted can perform actions such as periodically:
Checking for a stock price to hit a certain level and then executing a buy or sell at that price.
Checking a web site to see if any new documents have been deposited.
Intelligent agents take many forms.
32
1. Chatter Bots
2. Commerce Bots
3. Data Mining Bots
4. E-Mail Bots
5. Fun Bots
6. Game Bots
7. Government Bots
8. Knowledge Bots
9. Miscellaneous Bots
10. News Bots
11. Newsgroup Bots
12. Search Bots
13. Shopping Bots
14. Software Bots
15. Stock Bots
1.13 Robot
The Oxford English Dictionary defines a robot as:
One of the mechanical men and women in Capek's play; hence, a machine (sometimes
resembling a human being in appearance) designed to function in place of a living agent,
esp. one which carries out a variety of tasks automatically or with a minimum of external
impulse.
A person whose work or activities are entirely mechanical; an automaton.
The International Organization for Standardization also has a definition. Under ISO 8373, a robot
is: "An automatically controlled, reprogrammable, multi-purpose manipulator programmable in
three or more axes, which may be either fixed in place or mobile for use in industrial automation
applications."
The Robot Institute of America (1979) has defined a robot as:
"A reprogrammable, multifunctional manipulator designed to move material, parts, tools, or
specialized devices through various programmed motions for the performance of a variety of
tasks".
According to another definition, a robot is:
"An automatic device that performs functions normally ascribed to humans or a machine in the
form of a human."
Three Laws of Robotics
Asimov has proposed his three "Laws of Robotics", and he later added a “zeroth law”.
1) Law Zero: A robot may not injure humanity, or, through inaction, allow humanity to come to
harm.
2) Law One: A robot may not injure a human being, or, through inaction, allow a human being
to come to harm, unless this would violate a higher order law.
3) Law Two: A robot must obey orders given it by human beings, except where such orders
would conflict with a higher order law.
33
4) Law Three: A robot must protect its own existence as long as such protection does not
conflict with a higher order law.
Benefits
Robots offer specific benefits to workers, industries and countries. If introduced correctly,
industrial robots can improve the quality of life by freeing workers from dirty, boring, dangerous
and heavy labor. It is true that robots can cause unemployment by replacing human workers but
robots also create jobs: robot technicians, salesmen, engineers, programmers and supervisors.
The benefits of robots to industry include improved management control and productivity and
consistently high quality products. Industrial robots can work tirelessly night and day on an
assembly line without a loss in performance.
Consequently, they can greatly reduce the costs of manufactured goods. As a result of these
industrial benefits, countries that effectively use robots in their industries will have an economic
advantage on world market.
34
Check Your Progress:
1. Define the term “Artificial Intelligence”. Give its various applications.
2. Consider following statement:
“Surely the computers cannot be intelligent. They can do only what their programmers
tell them.”
Is the latter statement true? Justify your answer.
3. Do computer programs have IQ?
4. Write a short note on Turing Test.
5. Compare human and computer intelligence.
6. When did AI research start? How is AI research done?
7. Does AI aim at human-level intelligence?
8. How far is AI from reaching human-level intelligence? When will it happen?
9. Are computers fast enough to be intelligent?
10. Write short notes on:
a. Robotics
b. Expert Systems
c. AI Techniques
35
Chapter 2 Problem Solving
2.1 Problem Solving:
Problem solving is a process of generating solutions from observed data.
A “problem” is characterized by
a set of goals,
a set of objects, and
a set of operations
Problem solving has been the key areas of concern for Artificial Intelligence.
Problem solving is a process of generating solutions from observed or given data. It is however not
always possible to use direct methods (i.e. go directly from data to solution). Instead, problem
solving often need to use indirect or model-based methods.
The General Problem Solver (GPS) was a computer program created in 1957 by Simon and Newell
to build a universal problem solver machine. GPS was based on Simon and Newell's theoretical
work on logic machines. GPS in principle can solve any formalized symbolic problem, like:
theorems proof and geometric problems and chess playing.
GPS solved many simple problems such as the Towers of Hanoi, that could be sufficiently
formalized, but GPS could not solve any real-world problems.
To build a system to solve a particular problem, we need to:
1. Define the problem precisely – find input situations as well as final situations for acceptable
solution to the problem.
2. Analyze the problem – find few important features that may have impact on the
appropriateness of various possible techniques for solving the problem.
3. Isolate and represent task knowledge necessary to solve the problem
4. Choose the best problem-solving technique(s) and apply to the particular problem.
Problem Definitions:
A problem is defined by its ‘elements’ and their ‘relations’.
To provide a formal description of a problem, we need to do following:
a. Define a state space that contains all the possible configurations of the relevant objects,
including some impossible ones.
b. Specify one or more states that describe possible situations, from which the problem-solving
process may start. These states are called initial states.
c. Specify one or more states that would be acceptable solution to the problem. These states
are called goal states.
Specify a set of rules that describe the actions (operators) available.
36
The problem can then be solved by using the rules, in combination with an appropriate control
strategy, to move through the problem space until a path from an initial state to a goal state is
found.
This process is known as "search". Thus
Search is fundamental to the problem-solving process.
ıSearch is a general mechanism that can be used when more direct method is not known.
ıSearch provides the framework into which more direct methods for solving subparts of a
problem can be embedded.
A very large number of AI problems are formulated as search problems.
The term Problem Solving relates analysis in AI. Problem solving may be characterized as a
systematic search through a range of possible actions to reach some predefined goal or
solution. Problem-solving methods are categorized as special purpose and general purpose.
A special-purpose method is tailor-made for a particular problem, often exploits very
specific features of the situation in which the problem is embedded.
ıA general-purpose method is applicable to a wide variety of problems. One generalpurpose
technique used in AI is "means-end analysis”: a step-by-step, or incremental,
reduction of the difference between current state and final goal.
Examples:
For a Robot this might consist of PICKUP, PUTDOWN, MOVEFORWARD,
MOVEBACK,
MOVELEFT and MOVERIGHT until the goal is reached.
ıPuzzles and Games have explicit rules: e.g., the "Tower of Hanoi" puzzle.
This puzzle may involve a set of rings of different sizes that can be placed on three different
pegs.
The puzzle starts with the rings arranged as shown in Fig. (a)
The goal of this puzzle is to move them all as to Fig. (b)
Condition: Only the top ring on a peg can be moved, and it may only be placed on a smaller
ring, or on an empty peg.
In this "Tower of Hanoi" puzzle:
Situations encountered while solving the problem are described as "states".
Set of all possible configurations of rings on the pegs is called "problem space"
States
A State is a representation of elements in a given moment.
A problem is defined by its ‘elements’ and their ‘relations’.
At each instant of a problem, the elements have specific descriptors and relations; the “descriptors”
tell - how to select elements?
Among all possible states, there are two special states called:
37
Initial state is the start point
ıFinal state is the goal state
State Space
A state space is the set of all states reachable from the initial state.
A state space forms a graph (or map) in which the nodes are states and the arcs between
nodes are actions.
In state space, a path is a sequence of states connected by a sequence of actions.
The solution of a problem is part of the map formed by the state space.
Structure of a State Space
The structures of state space are trees and graphs.
A tree is a hierarchical structure in a graphical form; and a graph is a non-hierarchical structure.
A tree has only one path to a given node; i.e., a tree has one and only one path from any point to
any other point.
A graph consists of a set of nodes (vertices) and a set of edges (arcs). Arcs establish relationships
(connections) between the nodes; i.e., a graph has several paths to a given node.
The Operators are directed arcs between nodes.
A search process explores the state space. In the worst case, the search explores all possible paths
between the initial state and the goal state.
Problem Description
A problem consists of the description of:
the current state of the world
ıthe actions that can transform one state of the world into another,
ıthe desired state of the world.
State space is defined explicitly or implicitly
A state space should describe everything that is needed to solve a problem and nothing that is not
needed to solve the problem.
Initial state is start state
Goal state is the conditions it has to fulfill
ıA description of a desired state of the world;
ıThe description may be complete or partial.
Operators are to change state
ıOperators do actions that can transform one state into another.
ıOperators consist of: Preconditions and Instructions;
ıPreconditions provide partial description of the state of the world that must be true in
order to perform the action, and
ıInstructions tell on how to create next state.
ıOperators should be as general as possible, so as to reduce their number.
Elements of the domain has relevance to the problem
ıKnowledge of the starting point
Problem solving is finding solution
Find an ordered sequence of operators that transform the current (start) state into a goal
state;
Restrictions are solution quality any, optimal, or all
o Finding the shortest sequence, or
o ıFinding the least expensive sequence defining cost , or
o Finding any sequence as quickly as possible.
38
Problem Solution
In the state space, a solution is a path from the initial state to a goal state or, sometimes, just a goal
state.
A solution cost function assigns a numeric cost to each path. It also gives the cost of applying the operators to the
states.
A solution quality is measured by the path cost function; and an optimal solution has the lowest path cost among
all solutions.
The solutions can be any or optimal or all.
The importance of cost depends on the problem and the type of solution asked.
Example of Problem Definitions
8 – Puzzle Problem:
Initial State Goal State
State space : configuration of 8 - tiles on the board
Initial state : any configuration
Goal state : tiles in a specific order
Action : blank moves
Condition : the move is within the board
Transformation : blank moves Left, Right, Up, Down
Solution : optimal sequence of operators
2.2 State Space Search
The concept of State Space Search is widely used in Artificial Intelligence. The idea is that a
problem can be solved by examining the steps which might be taken towards its solution. Each
action takes the solver to a new state.
The classic example is of the Farmer who needs to transport a Chicken, a Fox and some Grain
across a river one at a time. The Fox will eat the Chicken if left unsupervised. Likewise the
Chicken will eat the Grain.
In this case, the State is described by the positions of the Farmer, Chicken, Fox and Grain. The
solver can move between States by making a legal move (which does not result in something being
eaten). Non-legal moves are not worth examining.
39
The solution to such a problem is a list of linked States leading from the Initial State to the Goal
State. This may be found either by starting at the Initial State and working towards the Goal state or
vice-versa.
The required State can be worked towards by either:
1. Depth-First Search: Exploring each strand of a State Space in turn.
2. Breadth-First Search: Exploring every link encountered, examining the state space a level at a
time.
These techniques generally use lists of:
1. Closed States: States whose links have all been explored.
2. Open States: States which have been encountered, but have not been fully explored.
Ideally, these lists will also be used to prevent endless loops.
State space search is a process used in the field of artificial intelligence (AI) in which successive
configurations or states of an instance are considered, with the goal of finding a goal state with a
desired property.
In AI, problems are often modelled as a state space, a set of states that a problem can be in. The set
of states form a graph where two states are connected if there is an operation that can be performed
to transform the first state into the second.
State space search as used in AI differs from traditional computer science search methods because
the state space is implicit: the typical state space graph is much too large to generate and store in
memory. Instead, nodes are generated as they are explored, and typically discarded thereafter. A
solution to a combinatorial search instance may consist of the goal state itself, or of a path from
some initial state to the goal state.
A State-Space Search Algorithm (Depth First Search)
Here's a very sketchy, high-level depth-first state-space search algorithm:
State-Space (Initial-States, Goal-State, Operators)
1. Look at the first (leftmost) initial-state
2. if that state is the goal-state, then return success
3. if that state isn't the goal-state, then generate all possible new states from that state by
applying the set of operators to that state
4. if there aren't any new states generated by applying those operators, then return failure
5. Call state-space with this new list of states passed as the initial-states argument, and if that
succeeds then return success else...
6. call state-space with the old list of initial states that remained after you stripped off the first
initial-state in step 1, and if that succeeds then return success else...
7. return failure
In step 3, you would like to check all the new states to see if you have explored them before. You
do that by keeping track of the sequence of states that was generated in going from the very first
40
state to where you are now, and then comparing that list to the set of new states you just generated.
If there are any duplicates, be sure to eliminate them from the set of new states.
Graph Theory Introduction
A graph consists of a set of nodes and a set of arcs or links connecting pairs of nodes. The domains
of state space search, the nodes, are interpreted to be states in a problem-solving process, and the
arcs are taken to be transitions between states. For example, to represent a game of chess the each
node would represent a state of the chessboard, and the arc would represent legal moves from one
state to another.
Bridges of Konigsberg
Lenhard Euler invented graph theory to solve the following problem:
The city of Konigsberg occupied both banks and two islands of a river. The islands and the
riverbanks were connected by seven bridges, as shown in the following graph. Is there a path
through the city that crosses each bridge exactly once?
Euler's Proof
Define the degree of a node as the number of arc passing through it. A node can have either even or
odd degree. With the exception of the beginning and ending nodes, the path would have to leave
each node exactly as often as it entered it. Therefore there should be exactly 2 nodes with odd
degree (the beginning and ending nodes) or 0 nodes with odd degree (if the walk started and ended
at the same node). Since the Konigsberg graph has 4 nodes of odd degree, it is impossible to find an
Euler path for it. An alternative way to describe the Konigsberg problem (Predicate Calculus):
Connect (i1, i2, b1). Connect (i2, i1, b1).
Connect (rb1, i1, b2). Connect (i1, rb1, b2).
Connect (rb1, i1, b3). Connect (i1, rb1, b3).
Connect (rb1, i2, b4). Connect (i2, rb1, b4).
Connect (rb2, i1, b5). Connect (i1, rb2, b5).
Connect (rb2, i1, b6). Connect (i1, rb2, b6).
Connect (rb2, i2, b7). Connect (i2, rb2, b7).
This way does not lend itself to Euler's proof.
Graph Definitions
41
A graph consists of:
o A set of nodes N1, N2, N3, ..., N3, ...
o A set of arcs that connect pairs of nodes. So arcs can be described as ordered pairs
(N1, N2).
A directed graph has an indicated direction for traversing each arc.
If a directed arc connects Nj and Nk, the Nj is called the parent of Nk, and Nk is called the
child of Nj. If the graph also contains an arc (Nj, Nl), then Nk and Nl are called siblings
A rooted graph has a unique node Ns from which all paths originate.
A tip or leaf is a node that has no children.
An ordered sequence of nodes [N1, N2, N3, ..., Nn], where each Ni and Ni+1 in the sequence
represents an arc, is called a path of length n in the graph.
On a path in a rooted graph, a state is said to be an ancestor of all states positioned after it
(to its right) and a descendant of all states before it (to its left).
A path that contains any state more than once is said to contain a cycle or loop.
A tree is a graph in which there is a unique path between every pair of nodes. The edges in a
rooted tree are directed away from the root. Each node in a rooted tree has a unique parent.
Two states in a graph are said to be connected if a path exists that includes both.
State Space Representation of Problems
42
A state space is represented by a four- tuple [N, A, S, GD]
N is a set of nodes or states of the graph. These correspond to the states in a problemsolving
process.
A is the set of arcs between the nodes. These correspond to the steps or moves in a problemsolving
process.
S, a nonempty subset of N, contains the start state(s) of the problem.
GD, a nonempty subset of N, contains the goal state(s) of the problem. The states in GD are
described using either:
1. A measurable property of the states encountered in the search.
2. A property of the path developed in the search.
A solution path is a path through this graph from a node in S to a node in GD.
Example: Traveling Salesman Problem
Starting at A, find the shortest path through all the cities, visiting each city exactly once and
returning to A.
43
Reducing Search Complexity: For N cities, a brute-force solution to the traveling salesman problem
takes N! time. There are ways to reduce this complexity:
Branch and Bound Algorithm: Generate one path at a time, keeping track of the best circuit
so far. Use the best circuit so far as a bound on future branches of the search. This reduces
the complexity to 1.26N.
Nearest Neighbor Heuristic: At each stage of the circuit, go to the nearest unvisited city.
This strategy reduces the complexity to N, so it is highly efficient, but it is not guaranteed to
find the shortest path, as the following example shows:
2.3 Production System
A production system (or production rule system) is a computer program typically used to provide
some form of artificial intelligence, which consists primarily of a set of rules about behavior. These
rules, termed productions, are a basic representation found useful in AI planning, expert systems
and action selection. A production system provides the mechanism necessary to execute
productions in order to achieve some goal for the system.
Productions consist of two parts: a sensory precondition (or "IF" statement) and an action (or
"THEN"). If a production's precondition matches the current state of the world, then the production
is said to be triggered. If a production's action is executed, it is said to have fired. A production
system also contains a database, sometimes called working memory, which maintains data about
current state or knowledge, and a rule interpreter. The rule interpreter must provide a mechanism
for prioritizing productions when more than one is triggered.
Basic Operation
Rule interpreters generally execute a forward chaining algorithm for selecting productions to
execute to meet current goals, which can include updating the system's data or beliefs. The
condition portion of each rule (left-hand side or LHS) is tested against the current state of the
working memory.
In idealized or data-oriented production systems, there is an assumption that any triggered
conditions should be executed: the consequent actions (right-hand side or RHS) will update the
agent's knowledge, removing or adding data to the working memory. The system stops processing
either when the user interrupts the forward chaining loop; when a given number of cycles has been
performed; when a "halt" RHS is executed, or when no rules have true LHSs.
Real-time and expert systems, in contrast, often have to choose between mutually exclusive
productions; since actions take time, only one action can be taken, or (in the case of an expert
44
system) recommended. In such systems, the rule interpreter, or inference engine, cycles through
two steps: matching production rules against the database, followed by selecting which of the
matched rules to apply and executing the selected actions.
Matching production rules against working memory
Production systems may vary on the expressive power of conditions in production rules.
Accordingly, the pattern matching algorithm which collects production rules with matched
conditions may range from the naive - trying all rules in sequence, stopping at the first match - to
the optimized, in which rules are "compiled" into a network of inter-related conditions.
The latter is illustrated by the RETE algorithm, designed by Charles L. Forgy in 1983, which is
used in a series of production systems, called OPS and originally developed at Carnegie Mellon
University culminating in OPS5 in the early eighties. OPS5 may be viewed as a full-fledged
programming language for production system programming.
Choosing which rules to evaluate
Production systems may also differ in the final selection of production rules to execute, or fire . The
collection of rules resulting from the previous matching algorithm is called the conflict set , and the
selection process is also called a conflict resolution strategy .
Here again, such strategies may vary from the simple -- use the order in which production rules
were written; assign weights or priorities to production rules and sort the conflict set accordingly --
to the complex -- sort the conflict set according to the times at which production rules were
previously fired; or according to the extent of the modifications induced by their RHSs. Whichever
conflict resolution strategy is implemented, the method is indeed crucial to the efficiency and
correctness of the production system.
Using Production Systems
The use of production systems varies from simple string rewriting rules to the modeling of human
cognitive processes, from term rewriting and reduction systems to expert systems.
A simple string rewriting production system example
This example shows a set of production rules for reversing a string from an alphabet that does not
contain the symbols "$" and "*" (which are used as marker symbols).
P1: $$ _*
P2: *$ _*
P3: *x _x*
P4: * _null & halt
P5: $xy _y$x
P6: null _$
In this example, production rules are chosen for testing according to their order in this production
list. For each rule, the input string is examined from left to right with a moving window to find a
match with the LHS of the production rule. When a match is found, the matched substring in the
45
input string is replaced with the RHS of the production rule. In this production system, x and y are
variables matching any character of the input string alphabet. Matching resumes with P1 once the
replacement has been made.
The string "ABC", for instance, undergoes the following sequence of transformations under these
production rules:
$ABC (P6)
B$AC (P5)
BC$A (P5)
$BC$A (P6)
C$B$A (P5)
$C$B$A (P6)
$$C$B$A (P6)
*C$B$A (P1)
C*$B$A (P3)
C*B$A (P2)
CB*$A (P3)
CB*A (P2)
CBA* (P3)
CBA (P4)
In such a simple system, the ordering of the production rules is crucial. Often, the lack of control
structure makes production systems difficult to design. It is, of course, possible to add control
structure to the production systems model, namely in the inference engine, or in the working
memory.
Production Systems and Artificial Intelligence
Quite apart from their cognitive plausibility, production systems have advantages over conventional
programming languages that make them ideal for task domains in which knowledge may change or
grow over time, and in which the initial problem state and final solution state may differ from user
to user: they are flexible, modular, and plausible.
Production Systems Are Flexible
Production systems use the same basic IF-THEN format to represent knowledge in very different
domains.
Production Systems Are Modular
In an ordinary computer program one procedure calls another, in such a way that a change to one
procedure may entail the modification of any others that call it. Simply removing a procedure may
well result in the collapse of the whole program. In contrast, the functional units of a production
system -- the set of rules in its rulebase -- are independent, self-contained chunks of knowledge,
any one of which can be altered or replaced without disabling the entire production system and
without requiring the modification of other rules. Such alterations might modify or restrict the
behaviour of the system, but will not cripple it. This is because the rules in a production system are
separate from the program that runs them: the rules do not interact with one another directly but
only through changes to the working memory. Modularity is especially important in large
production systems; as knowledge of a domain is modified or extended, new rules can be added in
46
a fairly straightforward manner. Failure of a system to perform some primitive action or draw some
inference can be remedied simply by writing a new rule whose head matches the relevant items in
the database and whose body executes the appropriate action.
The most important development in production systems has been in the building of expert systems:
computer programs that have the knowledge and expertise, in the form of hundreds or even
thousands of rules, that will enable them to operate at the level of the human expert in some
specialist domain. This makes them valuable as consultants in medicine, management, engineering,
computer-system configuration, and chemical analysis, to name but a few areas where expert
systems are in regular use. Such systems provide the support of expert knowledge that is relatively
cheap (a full-time human consultant commands a much higher salary!), reliable (humans do make
mistakes), portable (human experts are scarce, and sometimes too busy to come on call), and
untiring (human experts have to sleep sometimes; eventually they die); and, because of the
modularity of such systems, they can be extended to become more proficient than any human
expert whose knowledge has been “written into” the rulebase. Expert systems need not store
information uniquely in the form of production rules. For example, the PROSPECTOR system to
analyze geological data codes much of its knowledge in a semantic net; other systems have used
frames or a form of predicate logic.
Production Systems Are Plausible
Human experts do not simply apply their knowledge to problems; they can also explain exactly
why they have made a decision or reached a particular conclusion. This facility is also built into the
expert systems that simulate human expert performance: they can be interrogated at any moment
and asked, for example, to display the rule they have just used or to account for their reasons in
using that rule. That a system is able to explain its reasoning is in itself no guarantee that the human
user will understand the explanation: if the advice is to be of use, it is important that the system be
able to justify its reasoning process in a cognitively plausible manner, by working through a
problem in much the same way as a human expert would. Production systems, and the more
sophisticated expert systems, can be made to reason either forwards, from initial evidence towards
a conclusion, or backwards, from a hypothesis to the uncovering of the right kind of evidence that
would support that hypothesis, or by a combination of the two. One significant factor which will
determine whether a system will use forward or backward reasoning is the method used by the
human expert.
If all the necessary data are either pre-given or can be gathered, and if it is possible to state
precisely what is to be done in any particular set of circumstances, then it is more natural and likely
that a human being - and hence any machine modelling human performance - would work forward
from the data towards a solution. Medical diagnosis normally proceeds abductively from a patient's
symptoms back to the possible causes of the illness, and hence to an appropriate course of
treatment. This was a crucial factor in the design of, for example, MYCIN, a system that diagnoses
blood infections. MYCIN is a moderately large expert system, having around 450 rules, of which
the following is typical. (The rule is shown in its English form, which is used by MYCIN to
generate explanations to the user. For reasoning, the system calls on rules coded in an extension of
the LISP programming language.)
IF: 1) THE STAIN OF THE ORGANISM IS GRAMNEG AND
2) THE MORPHOLOGY OF THE ORGANISM IS ROD AND
3) THE AEROBICITY OF THE ORGANISM IS AEROBIC
47
THEN: THERE IS STRONGLY SUGGESTIVE EVIDENCE (0.8)
THAT THE CLASS OF THE ORGANISM IS
ENTEROBACTERIACEAE
2.4 Control Strategies
Search for a solution in a problem space, requires control strategies to control the search processes.
The search control strategies are of different types, and are realized by some specific type of control
structures.
Strategies for search
Some widely used control strategies for search are stated below:
Forward search: Here, the control strategies for exploring search proceeds forward from initial state
to wards a solution; the methods are called data directed.
Backward search: Here, the control strategies for exploring search proceeds backward from a goal
or final state towards either a soluble sub problem or the initial state; the methods are called goal
directed.
Both forward and backward search: Here, the control strategies for exploring search are a mixture
of both forward and backward strategies.
Systematic search: Where search space is small, a systematic (but blind) method can be used to
explore the whole search space. One such search method is depth-first search and the other is
breath-first search.
Heuristic Search:
Many searches depend on the knowledge of the problem domain. They have some measure of
relative merits to guide the search. The search so guided are called heuristic search and the methods
used are called heuristics.
A heuristic search might not always find the best solution but is guaranteed to find a good solution
in reasonable time.
Heuristic Search Algorithms:
ıFirst, generate a possible solution which can either be a point in the problem space or a
path from the initial state.
Then, test to see if this possible solution is a real solution by comparing the state reached
with the set of goal states.
Lastly, if it is a real solution, return, Otherwise repeat from the first again.
2.5 Depth-First and Breadth-First Search
In addition to the search direction (data- or goal-driven) a search is determined by the order in
which the nodes are examined: breadth first or depth first.
Consider the following graph:
48
Depth First Search examines the nodes in the following order:
A, B, E, K, S, L, T, F, M, C, G, N, H, O, P, U, D, I, Q, J, R
Breadth First Search examines the nodes in the following order:
A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U
2.6 Depth First Search Algorithm
This is an exhaustive search technique.
Here, the search systematically proceeds to some depth d, before another path is considered.
If the maximum depth of search tree is three, then if this limit is reached and if the solution has not
been found, then the search backtracks to the previous level and explores any remaining
alternatives at this level, and so on.
It is this systematic backtracking procedure that guarantees that it will systematically and
exhaustively examine all of the possibilities.
If the tree is very deep and the maximum depth searched is less then the maximum depth of the
tree, then this procedure is "exhaustive modulo of depth” that has been set.
In depth-first search, when a state is examined, all of its children and their descendants are
examined before any of its siblings.
open := [Start]; // Initialize
closed := [];
while open != [] { // While states remain
remove the leftmost state from open, call it X;
if X is a goal
return success; // Success
else {
generate children of X;
put X on closed;
eliminate children of X on open or closed; // Avoid loops
put remaining children on LEFT END of open; // Stack
49
}
} return failure; // Failure
A Trace of Depth-first Algorithm: Search for U
AFTER open closed
ITERATION
0 [A] []
1 [B C D] [A]
2 [E F C D] [B A]
3 [K L F C D] [E B A]
4 [S L F C D] [K E B A]
5 [L F C D] [S K E B A]
6 [T F C D] [L S K E B A]
7 [F C D] [T L S K E B A]
8 [M C D] [F T L S K E B A]
9 [C D] [M F T L S K E B A]
10 [G H D] [C M F T L S K E B A]
11 and so on until U is found or open = []
2.7 Breadth First Search Algorithm
In breadth-first search, when a state is examined, all of its siblings are examined before any of its
children. The space is searched level-by-level, proceeding all the way across one level before going
down to the next level.
In the following algorithm the open list is like NSL in the backtrack algorithm. It contains states
that have been generated but whose children have not been examined. The closed list is the union
of the DE and SL. It contains states that have already been examined.
open := [Start]; // Initialize
closed := [];
while open != [] { // While states remain
remove the leftmost state from open, call it X;
if X is a goal
return success; // Success
else {
generate children of X;
put X on closed;
eliminate children of X on open or closed; // Avoid loops
put remaining children on RIGHT END of open; // Queue
}
} return failure; // Failure
Note that, unlike backtrack, this algorithm does not store the solution path. If you want the solution
path, you should retain ancestor information on the closed list. For example, an ordered pair, (child,
parent), can be kept on the closed and open lists. This allows the solution path to be constructed
from the closed list.
50
A Trace of Breadth-first Algorithm: Search for U
AFTER open closed
ITERATION
0 [A] []
1 [B C D] [A]
2 [C D E F] [B A]
3 [D E F G H] [C B A]
4 [E F G H I J] [D C B A]
5 [F G H I J K L] [E D C B A]
6 [G H I J K L M] [F E D C B A]
7 [H I J K L M N] [G F E D C B A]
8 and so on until U is found or open = []
Optimality: Breadth-first search is guaranteed to find the shortest path to the goal, if a path to the
goal exists.
2.8 Depth First Search vs. Breadth First Search
Breadth first is guaranteed to find the shortest path from start to goal.
Use breadth first when it is known that a simple (short) solution exists.
The space utilization of breadth first, measured in terms of the size of the open list, is Bn
where B is the branching factor -- the average number of descendants per state -- and n is
the level.
Depth first is NOT guaranteed to find the shortest path but it gets deeply into the search
space.
Depth first is more efficient for spaces with a large branching factor, because it does not
have to keep all nodes at a given level on the open.
Space utilization of depth first is linear: B * n, since at each level the open list contains the
children of a single node.
Use depth first if it is known that the solution path will be long.
Depth first can get "lost" on a deep branch that doesn't lead to a goal.
2.9 Depth First with Iterative Deepening
DFID is another kind of exhaustive search procedure which is a blend of depth first and breadth
first search.
Algorithm: Steps
1. First perform a Depth-first search (DFS) to depth one.
2. Then, discarding the nodes generated in the first search starts all over and does a DFS to
level two.
3. Then three ... . . . . . until the goal state is reached.
So at each iteration, a complete depth-first search is performed. It is guaranteed to find the shortest
path, and its space utilization at any level n is B * n.
Time Complexity
51
The order of magnitude of the time complexity for all three algorithms is O(Bn). That is, all three
algorithms require exponential time in the worst case. This is true for all uninformed (brute-force)
searches.
2.10 Using State Space to Represent Reasoning in Logic
State Space Search in Logic
Problems in predicate calculus, such as determining whether one expression follows from a set of
assertions, can be solved using graph search. Consider this example from the propositional
calculus:
And/Or Graph
An and/or graph lets us represent expressions of the form p ^ q => r.
Using the above AND/OR graph, answer each of the following questions:
Is h true?
Is h true if b is no longer true?
52
What is the shortest path to show that X is true?
How would you show that some proposition p is false?
And/Or Predicate Calculus Graph
Using the following AND/OR graph, where is fred?
2.11 Heuristics and the 8-Tile Puzzle
Let's look again at the 8-tile puzzle from the last lecture. There we described a dumb, exhaustive,
brute-force, depth-first search for finding the goal state. Could you do better? Probably yes. If you
could come up with a way to estimate how close any given arrangement of tiles was to the goal,
you could always choose to explore the state that was nearest the goal. To do this, you'd have to
figure out a way to codify the metrics for this evaluation in such a way that a computer could use
them. One heuristic might be to just count the number of tiles that are in the place they belong. So
if your goal state looks like this:
123
84
765
And your start state followed by the next possible states looks like this:
283
164
75
/|\
/|\
/|\
/|\
/|\
/|\
/|\
/|\
283283283
53
16414164
7576575
score score score
353
Which of these next states is closer to the goal using our heuristic? The middle state has five tiles in
the right place, while the other two states have only three tiles in the right place. So for our next
step in the search, we'd choose to generate all the states possible from that middle state. Then we'd
apply our evaluation heuristic again, and so on. Of course, we could get more sophisticated with
our heuristic measures. For example, we could try to estimate how many moves it would take to get
all the tiles in their appropriate places instead of just counting how many were already in the right
place. That might give us a better measure of goodness, or it might just cause us to spend extra time
computing the goodness without any real return on the investment, or it might just completely
mislead the search. We'd have to play with it for awhile to see if it would help us.
8 – Puzzle Problem
The 8 puzzle is a simple game which consists of eight sliding tiles, numbered by digits from 1 to 8,
placed in a 3x3 squared board of nine cells. One of the cells is always empty, and any adjacent
(horizontally and vertically) tile can be moved into the empty cell. The objective of the game is to
start from an initial configuration and end up in a configuration which the tiles are placed in
ascending number order.
2.12 A* Search Algorithm
A* (pronounced "A star") is a best-first graph search algorithm that finds the least-cost path from a
given initial node to one goal node (out of one or more possible goals).
It uses a distance-plus-cost heuristic function (usually denoted f(x)) to determine the order in which
the search visits nodes in the tree. The distance-plus-cost heuristic is a sum of two functions:
the path-cost function, which is the cost from the starting node to the current node (usually
denoted g(x))
and an admissible "heuristic estimate" of the distance to the goal (usually denoted h(x)).
The h(x) part of the f(x) function must be an admissible heuristic; that is, it must not overestimate
the distance to the goal. Thus for an application like routing, h(x) might represent the straight-line
54
distance to the goal, since that is physically the smallest possible distance between any two points
(or nodes for that matter).
If the heuristic h satisfies the additional condition h (x) _ d (x, y) + h (y) for every edge x, y of the
graph (d denoting the length of that edge) then h is called monotone or consistent. In this case A*
can be implemented more efficiently — roughly speaking, no node needs to be processed more
than once; and in fact A* is equivalent to running Dijkstra's algorithm with the reduced cost d'(x,y):
= d(x,y) a h(x) + h(y).
The algorithm was first described in 1968 by Peter Hart, Nils Nilsson, and Bertram Raphael.
Conceptual Explanation
A* is a network searching algorithm that takes a "distance-to-goal + path-cost" score into
consideration. As it traverses the network searching all neighbors, it follows lowest score path
keeping a sorted "priority queue" of alternate path segments along the way. If at any point the path
being followed has a higher score than other encountered path segments, the higher score path is
abandoned and a lower score sub-path traversed instead. This continues until the goal is reached.
Algorithm Description
Like all informed search algorithms, it first searches the routes that appear to be most likely to lead
towards the goal. What sets A* apart from a greedy best-first search is that it also takes the distance
already traveled into account (the g(x) part of the heuristic is the cost from the start, and not simply
the local cost from the previously expanded node).
The algorithm traverses various paths from start to goal. For each node x traversed, it maintains 3
values:
g(x): the actual shortest distance traveled from initial node to current node
h(x): the estimated (or "heuristic") distance from current node to goal
f(x): the sum of g(x) and h(x)
Starting with the initial node, it maintains a priority queue of nodes to be traversed, known as the
open set (not to be confused with open sets in topology). The lower f(x) for a given node x, the
higher its priority. At each step of the algorithm, the node with the lowest f(x) value is removed
from the queue, the f and h values of its neighbors are updated accordingly, and these neighbors
are added to the queue. The algorithm continues until a goal node has a lower f value than any node
in the queue (or until the queue is empty). (Goal nodes may be passed over multiple times if there
remain other nodes with lower f values, as they may lead to a shorter path to a goal.) The f value of
the goal is then the length of the shortest path, since h at the goal is zero in an admissible heuristic.
If the actual shortest path is desired, the algorithm may also update each neighbor with its
immediate predecessor in the best path found so far; this information can then be used to
reconstruct the path by working backwards from the goal node. Additionally, if the heuristic is
monotonic (see below), a closed set of nodes already traversed may be used to make the search
more efficient.
55
Example
An example of A star (A*) algorithm in action (nodes are cities connected with roads, h(x) is the
straight-line distance to target point).
green - start,
blue - target,
orange - visited
Properties
Like breadth-first search, A* is complete in the sense that it will always find a solution if there is
one.
If the heuristic function h is admissible, meaning that it never overestimates the actual minimal cost
of reaching the goal, then A* is itself admissible (or optimal) if we do not use a closed set. If a
closed set is used, then h must also be monotonic (or consistent) for A* to be optimal. This means
that for any pair of adjacent nodes x and y, where d(x,y) denotes the length of the edge between
them, we must have:
This ensures that for any path X from the initial node to x:
where L(.) denotes the length of a path, and Y is the path X extended to include y. In other words, it
is impossible to decrease (total distance so far + estimated remaining distance) by extending a path
to include a neighboring node. (This is analogous to the restriction to nonnegative edge weights in
Dijkstra's algorithm.) Monotonicity implies admissibility when the heuristic estimate at any goal
node itself is zero, since (letting P = (f, v1, v2, …, vn, g) be a shortest path from any node f to the
nearest goal g):
56
A* is also optimally efficient for any heuristic h, meaning that no algorithm employing the same
heuristic will expand fewer nodes than A*, except when there are multiple partial solutions where h
exactly predicts the cost of the optimal path. Even in this case, for each graph there exists some
order of breaking ties in the priority queue such that A* examines the fewest possible nodes.
Special cases
Generally speaking, depth-first search and breadth-first search are two special cases of A*
algorithm. Dijkstra's algorithm, as another example of a best-first search algorithm, is the special
case of A* where h(x) = 0 for all x. For depth-first search, we may consider that there is a global
counter C initialized with a very big value. Every time we process a node we assign C to all of its
newly discovered neighbors. After each single assignment, we decrease the counter C by one. Thus
the earlier a node is discovered, the higher its h(x) value.
Why A* is admissible and computationally optimal
A* is both admissible and considers fewer nodes than any other admissible search algorithm with
the same heuristic, because A* works from an “optimistic” estimate of the cost of a path through
every node that it considers — optimistic in that the true cost of a path through that node to the goal
will be at least as great as the estimate. But, critically, as far as A* “knows”, that optimistic
estimate might be achievable.
When A* terminates its search, it has, by definition, found a path whose actual cost is lower than
the estimated cost of any path through any open node. But since those estimates are optimistic, A*
can safely ignore those nodes. In other words, A* will never overlook the possibility of a lowercost
path and so is admissible.
Suppose now that some other search algorithm B terminates its search with a path whose actual
cost is not less than the estimated cost of a path through some open node. Algorithm B cannot rule
out the possibility, based on the heuristic information it has, that a path through that node might
have a lower cost. So while B might consider fewer nodes than A*, it cannot be admissible.
Accordingly, A* considers the fewest nodes of any admissible search algorithm that uses a no more
accurate heuristic estimate.
This is only true if
A* uses a admissible heuristic. Otherwise, A* is not guaranteed to expand fewer nodes than
another search algorithm with the same heuristic. See (Generalized best-first search
strategies and the optimality of A*, Rina Dechter and Judea Pearl, 1985)
A* solves only one search problem rather than a series of similar search problems.
Otherwise, A* is not guaranteed to expand fewer nodes than incremental heuristic search
algorithms. See (Incremental heuristic search in artificial intelligence, Sven Koenig, Maxim
Likhachev, Yaxin Liu and David Furcy, 2004)
57
Complexity
The time complexity of A* depends on the heuristic. In the worst case, the number of nodes
expanded is exponential in the length of the solution (the shortest path), but it is polynomial when
the search space is a tree, there is a single goal state, and the heuristic function h meets the
following condition:
| h(x) a h * (x) | = O(logh * (x))
where h * is the optimal heuristic, i.e. the exact cost to get from x to the goal. In other words, the
error of h should not grow faster than the logarithm of the “perfect heuristic” h * that returns the
true distance from x to the goal (see Pearl 1984 and also Russell and Norvig 2003, p. 101)
More problematic than its time complexity is A*’s memory usage. In the worst case, it must also
remember an exponential number of nodes. Several variants of A* have been developed to cope
with this, including iterative deepening A* (IDA*), memory-bounded A* (MA*) and simplified
memory bounded A* (SMA*) and recursive best-first search (RBFS).
2.13 Consistent Heuristic
In computer science, a consistent (or monotone) heuristic function is a strategy for search that
approaches the solution in an incremental way without taking any step back. Formally, for every
node n and every successor p of n generated by any action a, the estimated cost of reaching the goal
from n is no greater than the step cost of getting to p plus the estimated cost of reaching the goal
from p. In other words:
and
where
h is the consistent heuristic function
N is any node in the graph
P is any child of N
G is any goal node.
A consistent heuristic is also admissible. This is proved by induction on m, the length of the best
path from node to goal. By assumption, h (Nm) _ h* (Nm), where h * (n) denotes the cost of the
shortest path from n to the goal. Therefore,
,
making it admissible. (Nm + 1 is any node whose best path to the goal, of length m+1, goes through
some immediate child Nm whose best path to the goal is of length m.)
Note: not all admissible heuristics are consistent. However, an admissible heuristic h, can be made
into a consistent heuristic, h', through the following adjustment:
58
(Known as the pathmax equation.)
2.14 Consequences of Monotonicty
Consistent heuristics are called monotone because the estimated final cost of a partial solution,
f(Nj) = g(Nj) + h(Nj) is monotonically nondecreasing along the best path to the goal, where
is the cost of the best path from start node N1 to Nj. It's necessary
and sufficient for a heuristic to obey the triangle inequality in order to be consistent.
In the A* search algorithm, using a consistent heuristic means that once a node is expanded, the
cost by which it was reached is the lowest possible, under the same conditions that Djikstra's
algorithm requires in solving the [[shortest path problem]] (no negative cost cycles). In fact, if the
search graph is given cost c'(N,P) = c(N,P) + h(P) a h(N) for a consistent h, then A* is
equivalent to best-first search on that graph using Djikstra's algorithm[1]. In the unusual event that
an admissible heuristic is not consistent, a node will need repeated expansion every time a new best
(so-far) cost is achieved for it.
2.15 Travelling Salesman Problem
The Travelling Salesman Problem (TSP) is a problem in combinatorial optimization studied in
operations research and theoretical computer science. Given a list of cities and their pairwise
distances, the task is to find a shortest possible tour that visits each city exactly once.
The problem was first formulated as a mathematical problem in 1930 and is one of the most
intensively studied problems in optimization. It is used as a benchmark for many optimization
methods. Even though the problem is computationally difficult, a large number of heuristics and
exact methods are known, so that some instances with tens of thousands of cities can be solved.
The TSP has several applications even in its purest formulation, such as planning, logistics, and the
manufacture of microchips. Slightly modified, it appears as a sub-problem in many areas, such as
genome sequencing. In these applications, the concept city represents, for example, customers,
soldering points, or DNA fragments, and the concept distance represents travelling times or cost, or
a similarity measure between DNA fragments. In many applications, additional constraints such as
limited resources or time windows make the problem considerably harder.
In the theory of computational complexity, the decision version of TSP belongs to the class of NPcomplete
problems. Thus, it is assumed that there is no efficient algorithm for solving TSPs. In
other words, it is likely that the worst case running time for any algorithm for TSP increases
exponentially with the number of cities, so even some instances with only hundreds of cities will
take many CPU years to solve exactly.
59
Description
As a graph problem
Symmetric TSP with four cities
The TSP can be modelled as a graph, such that cities are the graph’s vertices, paths are the graph’s
edges, and a path's distance is the edge's length. A TSP tour becomes a Hamiltonian cycle, and the
optimal TSP tour is the shortest Hamiltonian cycle. Often, the model is a complete graph (i.e. an
edge connects each pair of vertices). If no path exists between two cities, adding an arbitrarily long
edge will complete the graph without affecting the optimal tour.
Asymmetric and symmetric
In the symmetric TSP, the distance between two cities is the same in each direction, forming an
undirected graph. This symmetry halves the number of possible solutions. In the asymmetric TSP,
paths may not exist in both directions or the distances might be different, forming a directed graph.
Traffic collisions, one-way streets, and airfares for cities with different departure and arrival fees
are examples of how this symmetry could break down.
With metric distances
In the metric TSP the intercity distances satisfy the triangle inequality. This can be understood as
“no shortcuts”, in the sense that the direct connection from A to B is never longer than the detour
via C.
These edge lengths define a metric on the set of vertices. When the cities are viewed as points in
the plane, many natural distance functions are metrics.
In the Euclidian TSP the distance between two cities is the Euclidean distance between the
corresponding points.
In the Rectilinear TSP the distance between two cities is the sum of the differences of their
x- and y-coordinates. This metric is often called the Manhattan distance or city-block
metric.
In the maximum metric, the distance between two points is the maximum of the differences
of their x- and y-coordinates.
34
20
42
12
30 35
CD
AB
60
The last two metrics appear for example in routing a machine that drills a given set of holes in a
printed circuit. The Manhattan metric corresponds to a machine that adjusts first one co-ordinate,
and then the other, so the time to move to a new point is the sum of both movements. The
maximum metric corresponds to a machine that adjusts both co-ordinates simultaneously, so the
time to move to a new point is the slower of the two movements.
Non-metric distances
Distance measures that do not satisfy the triangle inequality arise in many routing problems. For
example, one mode of transportation, such as travel by airplane, may be faster, even though it
covers a longer distance.
In its definition, the TSP does not allow cities to be visited twice, but many applications do not
need this constraint. In such cases, a symmetric, non-metric instance can be reduced to a metric
one. This replaces the original graph with a complete graph in which the inter-city distance cij is
replaced by the shortest path between i and j in the original graph.
2.16 Tic-tac-toe
Tic-tac-toe, also spelled tick tack toe, and alternatively called noughts and crosses, hugs and kisses,
and many other names, is a pencil-and-paper game for two players, O and X, who take turns
marking the spaces in a 3×3 grid, usually X going first. The player who succeeds in placing three
respective marks in a horizontal, vertical or diagonal row wins the game.
The following example game is won by the first player, X:
Players soon discover that best play from both parties leads to a draw. Hence, tic-tac-toe is most
often played by young children; when they have discovered an unbeatable strategy they move on to
more sophisticated games such as dots and boxes or 12-cell tic-tac-toe. This reputation for ease has
led to casinos offering gamblers the chance to play tic-tac-toe against trained chickens—though the
chicken is advised by a computer program.
61
The first two plies of the game tree for tic-tac-toe
The simplicity of tic-tac-toe makes it ideal as a pedagogical tool for teaching the concepts of
combinatorial game theory and the branch of artificial intelligence that deals with the searching of
game trees. It is straightforward to write a computer program to play tic-tac-toe perfectly, to
enumerate the 765 essentially different positions (the state space complexity), or the 26,830
possible games up to rotations and reflections (the game tree complexity) on this space.
The first known video game, OXO (or Noughts and Crosses, 1952) for the EDSAC computer
played perfect games of tic-tac-toe against a human opponent.
One example of a Tic-Tac-Toe playing computer is the Tinkertoy computer, developed by MIT
students, and made out of Tinker Toys. It only plays Tic-Tac-Toe and has never lost a game. It is
currently on display at the Museum of Science, Boston.
Number of possible games
Despite its apparent simplicity, it requires some complex mathematics to determine the number of
possible games. This is further complicated by the definitions used when setting the conditions.
Simplistically, there are 362,880 (ie. 9!) ways of placing Xs and Os on the board, without regard to
winning combinations.
When winning combinations are considered, there are 255,168 possible games. Assuming that X
makes the first move every time:
131,184 finished games are won by (X)
1,440 are won by (X) after 5 moves
47,952 are won by (X) after 7 moves
62
81,792 are won by (X) after 9 moves
77,904 finished games are won by (O)
5,328 are won by (O) after 6 moves
72,576 are won by (O) after 8 moves
46,080 finished games are drawn
Ignoring the sequence of Xs and Os, and after eliminating symmetrical outcomes (ie. rotations
and/or reflections of other outcomes), there are only 138 unique outcomes. Assuming once again
that X makes the first move every time:
91 unique outcomes are won by (X)
21 won by (X) after 5 moves
58 won by (X) after 7 moves
12 won by (X) after 9 moves
44 unique outcomes are won by (O)
21 won by (O) after 6 moves
23 won by (O) after 8 moves
3 unique outcomes are drawn
Strategy
A player can play perfect tic-tac-toe if they choose the move with the highest priority as:
1. Win: If you have two in a row, play the third to get three in a row.
2. Block: If the opponent has two in a row, play the third to block them.
3. Fork: Create an opportunity where you can win in two ways.
4. Block Opponent's Fork:
Option 1: Create two in a row to force the opponent into defending, as long as it doesn't
result in them creating a fork or winning. For example, if "X" has a corner, "O" has the
center, and "X" has the opposite corner as well, "O" must not play a corner in order to win.
(Playing a corner in this scenario creates a fork for "X" to win.)
Option 2: If there is a configuration where the opponent can fork, block that fork.
5. Center: Play the center.
6. Opposite Corner: If the opponent is in the corner, play the opposite corner.
7. Empty Corner: Play an empty corner.
8. Empty Side: Play an empty side.
The first player, whom we shall designate "X," has 3 possible positions to mark during the first
turn. Superficially, it might seem that there are 9 possible positions, corresponding to the 9 squares
in the grid. However, by rotating the board, we will find that in the first turn, every corner mark is
strategically equivalent to every other corner mark. The same is true of every edge mark. For
strategy purposes, there are therefore only three possible first marks: corner, edge, or center. Player
63
X can win or force a draw from any of these starting marks; however, playing the corner gives the
opponent the smallest choice of squares which must be played to avoid losing.
The second player, whom we shall designate "O," must respond to X's opening mark in such a way
as to avoid the forced win. Player O must always respond to a corner opening with a center mark,
and to a center opening with a corner mark. An edge opening must be answered either with a center
mark, a corner mark next to the X, or an edge mark opposite the X. Any other responses will allow
X to force the win. Once the opening is completed, O's task is to follow the above list of priorities
in order to force the draw, or else to gain a win if X makes a weak play.
2.17 Water Jug Problem
There is an eight-litre jug full of water and two empty jugs; one of them can hold five litres of
water and the other three litres. If the water is to be shared equally between two people, how can
this be done?
Solution:
Fill the five-litre jug from the eight-litre jug. From the five-litre jug, fill up the three-litre jug. Pour
the water from the three-litre into the eight-litre jug. Pour the remaining amount of water in the
five-litre jug (two litres) into the three-litre jug. Fill the five-litre jug from the eight-litre jug (which
previously held six litres and now holds one litre). Fill the three-litre jug using one litre from the
five-litre jug. Then add the contents of the three-litre jug to the eight-litre jug. Both the five-litre jug
and the eight-litre jug each now contain four litres.
2.18 Minimax Search Procedure
1. It is a depth-first, depth limited search procedure.
2. The idea is to start at the current position and use the plausible-move generator to generate
the set of possible successor positions.
3. We can apply the static evaluation function to those positions and simply choose the best
one.
4. We can back that value up to the starting position to represent our evaluation of it.
5. Here we assume that the static evaluation function returns large values to indicate good
situations for us, so our goal is to maximize the value of static evaluation function of the
next board position.
64
1. Our goal is to maximize the value of the static evaluation function of the next board
position.
2. Opponent’s goal is to minimize the value of the evaluation function.
3. When we have the two-ply look ahead, suppose we made move B, the opponent can be
expected to choose move F.
4. Correct move for us C since there is nothing opponent can do from there to produce a value
worse than -2
5. The alternation of maximizing and minimizing at alternate ply when evaluations are being
pushed back up corresponds to the opposing strategies of the two players and gives this
method the name Minimax.
Auxiliary Procedures for Implementing Minimax
1. MOVEGEN (Position, Player): The plausible move generator which returns a list of nodes
representing the moves that can be made by player in position. We call the two players
PLAYER-ONE and PLAYER-TWO; in a chess program we might use the names BLACK
and WHITE.
2. STATIC (Position, Player): The static evaluation function, which returns a number
representing the goodness of position from the standpoint of the Player.
3. DEEP-ENOUGH (Position, Depth): returns TRUE if the search should be stopped at the
current level and FALSE otherwise.
Termination Conditions for Minimax Procedure (Recursion loop)
• Has one side won?
• How many ply have we already explored?
• How promising is this path?
• How much time is left?
• How stable is the configuration?
Algorithm MINIMAX (Position, Depth, Player)
65
1. If DEEP-ENOUGH( Position, Depth), then return the structure
VALUE = STATIC (Position, Player);
PATH = nil
This indicates that there is no path from this node and that its value is that determined by the
static evaluation function.
2. Otherwise, generate one more ply of the tree by calling the function MOVE-GEN (Position,
Player) and setting SUCCESSORS to the list it returns.
3. If SUCCESSORS is empty, then there are no moves to be made, so return the same
structure that would have been returned if DEEP-ENOUGH had returned true.
4. If SUCCESSORS is not empty, then examine each element in turn and keep track of the
best one. This is done as follows:
Initialize BEST-SCORE to the minimum value that STATIC can return. It will be updated to reflect
the best score that can be achieved by an element of SUCCESSORS.
For each element SUCC of SUCCESSORS, do the following:
a. Set RESULT-SUCC to
MINIMAX (SUCC, Depth+1, OPPOSITE (Player))
This recursive call to MINIMAX will actually carry out the exploration of SUCC
b. Set NEW-VALUE to –VALUE (RESULT-SUCC). This will cause it to reflect the merits of
the position form the opposite perspective from that of the next lower level.
c. If NEW-VALUE > BEST-SCORE, then we have found a successor that is better than any
that have been examined so far. Record this by doing the following:
i. Set BEST-SCORE to NEW-VALUE
ii. The best known path is now from CURRENT to SUCC and then on to the appropriate
path down from SUCC as determined by the recursive call to MINIMAX. So set
BEST-PATH to the result of attaching SUCC to the front of PATH (RESULT-SUCC).
5. Now that all the successors have been examined, we know the value of Position as well as
which path to take from it. So return the structure:
VALUE = BEST-SCORE
PATH = BEST-PATH
Alpha-Beta Pruning
1. Minimax is a depth-first approach. One path is explored as far as time allows.
2. Efficiency of the depth-first search can be improved by using branch-and-bound techniques
in which partial solutions that are clearly worse than known solutions can be abandoned
early.
3. It is necessary to modify the search procedure slightly to handle both maximizing and
minimizing players, it is also necessary to modify the branch-and-bound strategy to include
two bounds, one for each players
4. This strategy is called alpha-beta pruning.
5. It requires the maintenance of two threshold values,
a. alpha representing a lower bound on the value that a maximizing node may
ultimately be assigned and
b. beta representing an upper bound on the value that a minimizing node may be
assigned.
6. MINIMAX-A-B( CURRENT, 0, PLAYER-ONE, maximum value STATIC can compute,
minimum value STATIC can compute )
Alpha-Beta Pruning Example
66
67
Check Your Progress:
1. What are problem solving, search and control strategies?
2. Discuss the “Tower of Hanoi” puzzle.
3. Illustrate with example the breadth-first search algorithm.
4. When would the best first search be worse than simple breadth first search? Give
example.
5. Write an algorithm to search a tree using depth-first strategy. Explain how this
algorithm works on the following tree
6. Consider the game of tic-tac-toe, where two players put a ‘X’ or ‘O’ alternatively on
the following nine places:
Draw the search space for the next move starting from,
7. How does the use of a heuristic function reduce the search space? Suggest a good
heuristic function for each of the following:
(i) Travelling Salesperson problem
(ii) 8 puzzle problem.
8. Explain the role of heuristic knowledge in solving problems. Illustrate your
explanation with an example.
9. Consider the following problem:
You are given two jugs of capacity 4 gallons and 3 gallons. Neither has any
measuring mark on it'
68
How can you get exactly 2 gallons of water into the 4 gallon jug? Write production
rules for the problem and obtain the solution of this problem.
10. Discuss A* algorithm. Why A* is admissible and computationally optimal.
69
Chapter 3 Knowledge Representation
3.1 Knowledge Representation
The main objective of knowledge representations is to make knowledge explicit. Knowledge can be
shared less ambiguously in its explicit form and this became especially important when machines
started to be applied to facilitate knowledge management.
Nowadays, Knowledge Representation is a multidisciplinary field that applies theories and
techniques from:
Logic: provides the formal structure and rules of inference.
Ontology: defines the kinds of things that exist in the application domain.
Computation: supports the applications that distinguish knowledge representation from pure
philosophy.
Therefore, Knowledge Representation can be defined as the application of logic and ontology to the
task of constructing computable models of some domain. Logic and Ontology provide the
formalization mechanisms required to make expressive models easily sharable and computer aware.
However, computers play only the role of powerful processors of more or less rich information
sources. The final interpretation of the results is carried out by the agents that motivate this
processing of the knowledge management systems.
The applications of actual Knowledge Representation techniques are enormous. Knowledge is
always more than the sum of its parts and Knowledge Representation provides the tools needed to
manage accumulations of knowledge and the World Wide Web is becoming the biggest
accumulation of knowledge ever faced by humanity.
Knowledge Representation Principles
Knowledge Representation can be described by the five fundamental roles that it plays in artificial
intelligence; they are the Knowledge Representation principles:
1. A knowledge representation is a surrogate: symbols are used to represent external things
that cannot be stored in a computer, i.e. physical objects, events, and relationships. Symbols
are surrogates for the external things. Symbols and links between them form a model of the
external system that can be manipulated to simulate it or reason about it.
2. A knowledge representation is a set of ontological commitments: Ontology is the study
of existence. Thus, ontology determines the categories of things that exist or may exist in an
application domain. Those categories set the ontological commitments of the application
designer or knowledge engineer.
3. A knowledge representation is a fragmentary theory of intelligent reasoning: to support
reasoning about modeled things in a domain, a knowledge representation must describe
their behaviour and interactions. The description constitutes a theory of the application
70
domain. It can be stated, for instance, as explicit axioms or compiled into computable
programs.
4. A knowledge representation is a medium for efficient computation: besides representing
knowledge, an Artificial Intelligence System must encode knowledge in a form that can be
processed efficiently by the available computing equipment. Therefore, developments in
computer hardware and programming theory have a great influence on knowledge
representation.
5. A knowledge representation is a medium for human expression: a good knowledge
representation language should facilitate communication between the knowledge engineers
who manage knowledge tools and the domain experts who understand the application
domain. Domain experts should be able to read and verify the domain definitions and rules
written by knowledge engineers.
Levels of Representation
When applied in the computer domain, knowledge representations range from computer-oriented
forms to conceptual ones nearer to those present in our internal world models. Five knowledge
levels can be established using this criterion:
1. Implementational: this is the more computer aware level. It includes data structures such
as atoms, pointers, lists and other programming notations.
2. Logical: symbolic logic is inside this level. Thus, symbolic logic propositions, predicates,
variables, quantifiers and Boolean operations are included.
3. Epistemological: a level for defining concept types with subtypes, inheritance, and
structuring relations.
4. Conceptual: the level of semantic relations, linguistic roles, objects and actions.
5. Linguistic: the more computers distant level, it deals with arbitrary concepts, words and
expressions of natural languages.
3.2 Characteristics of Knowledge
A. Paradigmatic Context of Research
1. Epistemology
What makes something true? What is it to understand something? What are considered
facts? Only after facts are separated from illusion can the systematic collection of facts
proceed.
2. Paradigmatic Level
a) Most Global Context (field)
Once a person chooses a level of molarity and a time scale, "Psychology of Learning" for
example, rarely is other chosen fields or other time scales considered.
b) Context (research specialty or area)
In point of fact, a researcher rarely steps outside a specific context or research specialty.
Examples of these "local" contexts would be "matching" or "timing."
c) Immediate Context (researcher's own lab)
Sometimes research is carried out with very little connection to any other research at all.
B. Degree to Which Research is integrated within Paradigmatic Context
71
The degree to which research is integrated within its paradigmatic context determines two
important aspects of the research, its likely validity and its usefulness.
1. Extent
Research can be either more or less well integrated into its paradigmatic context.
2. Ramifications
a) Validity
Research well integrated within a paradigmatic context has a wide variety of support and is
likely to be valid, while non-paradigmatic research has only itself to provide proof or
support.
b) Usefulness
Research well integrated within a paradigm also has many aspects which are “preunderstood.”
Its underlying machinery has generally already been thought through.
Relationships are, therefore, understood. The relationship is predictable and generalisable.
Integration enables the information to be used more effectively.
C. Purpose of Research
1. For Curiosity
It may be that it interests us and that is a good enough reason. If it interests us, then that
means some implicit theory we believe did not prepare us for the event that piqued our
interest. Presumably, whatever was of interest to us will be of interest to others once we get
a “handle on it.”
2. Construction of Functional Context
This type of knowledge gathering systematically obtains facts as well as the necessary other
information to develop a coherent frame of reference or context for meaningful explanation.
The following enumeration illustrates the kinds of procedures or kinds of information which
could be obtained.
a) Relevant / Irrelevant Variables
What are functionally significant variables and which are irrelevant or are simply
confounds?
b) Parameter Documentation
What changes in the dependent variable are caused by what changes in the independent
variable?
c) Functional Similarity
What other well-understood functional relationships have properties which are similar so
that even more fundamental explanations common to both phenomena can be uncovered?
3. For Theory Testing
Often we carry out research to see if a theory is correct or not. We could deduce some
experimental test such as: “according to my understanding of the processes involved, if we
double the reinforcement rate on Schedule A, then the rate of behaviour should be halved on
Schedule C.” Keep in mind that a single positive finding supports a theory but only
marginally, while a single negative finding whose interpretation is correct is very damaging
to a theory. But also keep in mind that everything hinges on the author's understanding of
the disconfirmation. For that reason, multiple converging evidence must be a requirement of
theory testing.
72
D. Breadth of Research Findings
Research can produce a single fact or a large set of interrelated findings. This is not the degree of
integration into the paradigm, but rather the degree of self integration or completeness of the
findings themselves.
E. Type of Knowledge Produced by the Research
When studying behaviour, two distinctly different kinds of questions emerge.
One type asks things such as, “How fast can a pigeon peck,” or “How many colours can pigeons
discriminate?”
A second type, one that's vastly more important to psychology, asks things such as, "Why does this
type of experience produce that type of rate change,” or “What type of experience produces that
type of control by the stimuli?” Note that none of these questions necessarily requires a
reductionistic explanation. Whether the explanation appeals to higher or lower levels of molarity or
shorter or longer time scales is a different issue.
F. Phase of Research Helix
The analysis phase is the first stage in the construction of an integrated framework of explanation.
Synthesis is the second stage.
1. Analysis
This aspect of research proceeds by breaking a phenomenon down into simpler elements. Analysis
is based on the assumption that the action of a whole is the result of the action of its parts and their
interaction. By isolating the parts and coming to understand their simple processes, then complex
wholes can best and most efficiently come to be understood. The belief is that the complexity and
unpredictability of wholes is due to the action of the many small difficult to control processes
making up the whole. Analysis is specifically designed to obtain information concerning the nature
of the underlying behavioural process by breaking the phenomenon into its parts. This is the
process of isolating active variables or ingredients, or the removal of irrelevant or confounding
variables.
2. Synthesis
Synthesis is the putting together or creation of something. The purpose of synthesis is to assemble
known parts into a whole. The result is the production of a complex behaviour or an integrated
theory. It is an important stage in the empirical collection of knowledge because it provides
feedback with respect to the validity of the presumed process.
3.3 Knowledge Representation Technologies
1. Logic programming
2. Production systems
3. Semantic Networks
4. Description Logics
5. Conceptual Graphs
1. Logic programming
73
The most relevant feature of logic programming technologies is that they implement restricted
versions of logic to improve performance. Generally, they use backwards-chaining implementation
of inference.
Logic programming has evolved from research on theorem provers. Theorem provers use full firstorder
logic and implement Robinson's resolution rule, a complete and sound form of backwardchaining
deduction. They are used mainly for mathematical and scientific problems.
On the contrary, logic-programming languages are used when greater performance is needed. They
usually sacrifice completeness to reach this goal. However, both approaches are based on backward
chaining a thus best suited to question answering.
Prolog
It is a widely used logic programming language. It imposes some restrictions to its logic
component. Horn-clause logic is used so it has not disjunction (_) in implication (e) conclusions.
Moreover, there is no negation in Prolog. However, it is supported at the multilevel by negation as
failure. When something cannot be demonstrated with what is known; it is considered false.
Prolog incorporates some built-in predicates and functions that provide useful primitives and nonlogical
programming facilities, e.g. computer input/output management. They are the facilities and
building blocks over which logic programs and personalized predicates and functions are defined.
They conform the ontology that captures the knowledge structures the Prolog knowledge base over
which logic programs work.
2. Production systems
Like programming languages, production systems use implication as the primary representation
element. They operate with a forward chaining control structure that operates iteratively. When the
premises of an implication, known as rule, are satisfied by facts in the knowledge base, it is fired.
Therefore, they are particularly suited to model reactive behaviours.
Firing a rule results in interpreting its consequents. They are interpreted as actions and not as mere
logical conclusions. These actions, among others, comprise knowledge base insertions and
deletions. Some production systems have mechanisms to resolve cases where many rules can be
fired simultaneously. For instance, the may resolve them implementing rule precedence or a nondeterministic
selection behaviour.
JESS:
Jess is production system implemented with Java. It is inspired in a previous production system
called CLIPS. Jess implements a ‘rete’ algorithm to improve rule-matching performance, the most
important aspect of this kind of knowledge representation systems. In short, the algorithm
maintains pointers to partial rule matches from data elements. This directly relates rules to data
whose changes might affect them. Therefore, when some data is changed, it is quite direct to know
which rules might then get all their firing conditions satisfied.
3. Semantic Networks
74
They are particularly suited to model static world knowledge. World objects and classes of objects
are modeled as graph nodes and binary relations among them are captured as edges between nodes.
There are different types of edges.
The taxonomy supports a built-in fast inference method, inheritance, to reason about generic and
specific object features. The general graph structure can be used to efficiently reason about
inheritance and to locate information.
Their greatest problem has been that they have lacked consensus semantics for a long time.
Currently, they have been completely formalized as a subset of FOL.
HTML
The HTML language of World Wide Web can be viewed as constructing a global Semantic
Network, with pages, links between pages and very limited set of link types. The only distinction is
between the external link <a href=”…”>…</a> and the link to embedded images <img src=”…”>.
Frames
Frames are an evolution of semantic networks. They add procedural attachments to the node and
edge structure of semantic networks. Altogether, the resulting framework and modeling paradigm
has evolved into the object oriented programming paradigm.
This new paradigm has had great acceptance, for instance Sun's Java object oriented programming
language. Objects and classes are nodes and relations are modeled as object references stored in
object and class variables. The taxonomical relations are built-in in Java language. Subsumption is
declared as “subclass extends class” constructs in class definitions. Object-class membership is
stated when a new object is created with the “object = new Class” construct. The procedural
attachments are represented as class methods. Their behaviour is defined with the procedural part of
Java in class definitions.
4. Description Logics
Description Logics allow specifying a terminological hierarchy using a restricted set of first order
formulas. Restrictions make that Description Logics usually have nice computational properties.
They are often decidable and tractable, or at least seem to have nice average computational
properties, but the inference services are restricted to subsumption and classification.
Subsumption means, given formulae describing classes, the classifier associated with certain
description logic will place them inside a hierarchy. On the other hand, classification means that
given an instance description, the classifier will determine the most specific classes to which the
particular instance belongs.
Each description logic defines also a number of language constructs, such as intersection, union,
role quantification, etc., that can be used to define new concepts and roles.
FaCT
Fast Classification of Terminologies (FaCT) is a Description Logic classifier. The FaCT system
includes two reasoners, one for the description logic SHF (ALC augmented with transitive roles,
75
functional roles and a role hierarchy) and the other for the logic SHIQ (SHF augmented with
inverse roles and qualified number restrictions). Both of them are implementing using sound and
complete tableaux algorithms. This kind of algorithms is specially suited for subsumption
computation and has become the standard for Description Logic systems.
Construct Syntax Language
Concept A
Role name R
Conjunction C _ D
Value restriction _ R, C
Existantial quantification _R
FLTop
_
Bottom _
Negation ¬A¬C
Disjunction C D
Existential restriction _R, C
Number restrictions (_ n R) (_ n R)
Collection of indivisuals {a1 … an}
AL*
Role hierarchy R _ S H
Inverse role R- I
Quantified number restriction (_ n R, C) (_ n R, C) Q
Description Logics languages and their characteristics
5. Conceptual Graphs
Conceptual Graphs were developed from Existential Graphs and Semantic Networks. They are
under standardization process in conjunction with KIF (Knowledge Interchange Format). Both are
different ways of representing FOL. KIF is an attempt to standardize a linear syntax of FOL while
Conceptual Graphs provide a diagrammatic syntax.
Conceptual Graph for “John goes to Boston by bus”
Person: John City: Bostan
Bus
Go
Inst
Agnt Dest
76
Although Conceptual Graphs are equivalent to FOL, their graph orientation provides many features
that make them especially useful. They express meaning in a form that is logically precise, humanly
readable, and computationally tractable. With a direct mapping to language, conceptual graphs
serve as an intermediate language for translating computer-oriented formalisms to and from natural
languages. With their graphic representation, they serve as a readable, but formal design and
specification language.
3.4 First Order Logic
First-order logic is a formal logic goes by many names, including: first-order predicate calculus,
first order predicate logic, the lower predicate calculus, and predicate logic. First-order logic is
distinguished from propositional logic by its use of quantifiers; each interpretation of first-order
logic includes a domain of discourse over which the quantifiers range.
There are many deductive systems for first-order logic that are sound and complete i.e. able to
derive any logically valid implication. First-order logic also satisfies several metalogical theorems
that make it amenable to analysis in proof theory, such as the Löwenheim–Skolem theorem and the
compactness theorem.
A predicate resembles a function that returns either True or False.
Let us consider the following sentences: “Socrates is a philosopher”, "Plato is a philosopher". In
propositional logic these are treated as two unrelated propositions, denoted for example by p and q.
In first-order logic, however, the sentences can be expressed in a more parallel manner using the
predicate Phil(a), which asserts that the object represented by a is a philosopher. Thus if a
represents Socrates then Phil(a) asserts the first proposition, p; if a represents Plato then Phil(a)
asserts the second proposition, q. A key aspect of first-order logic is visible here: the string "Phil" is
a syntactic entity which is given semantic meaning by declaring that Phil(a) holds exactly when a is
a philosopher. An assignment of semantic meaning is called an interpretation.
First-order logic allows reasoning about properties that are shared by many objects, through the use
of variables. For example, let Phil(a) assert that a is a philosopher and let Schol(a) assert that a is a
scholar. Then the formula
asserts that if a is a philosopher then a is a scholar. The symbol _is used to denote a conditional
(if/then) statement. The hypothesis lies to the left of the arrow and the conclusion to the right. The
truth of this formula depends on which object is denoted by a, and on the interpretations of "Phil"
and "Schol".
Assertions of the form "for every a, if a is a philosopher then a is a scholar" require both the use of
variables and the use of a quantifier. Again, let Phil(a) assert a is a philosopher and let Schol(a)
assert that a is a scholar. Then the first-order sentence
asserts that no matter what a represents, if a is a philosopher then a is scholar. Here _, the
universal quantifier, expresses the idea that the claim in parentheses holds for all choices of a.
77
To show that the claim "If a is a philosopher then a is a scholar" is false, one would show there is
some philosopher who is not a scholar. This counterclaim can be expressed with the existential
quantifier _:
Here:
is the negation operator: is true if and only if Schol(a) is false, in other
words if and only if a is not a scholar.
is the conjunction operator: asserts that a is a philosopher and
also not a scholar.
The predicates Phil(a) and Schol(a) take only one parameter each. First-order logic can also express
predicates with more than one parameter. For example, "there is someone who can be fooled every
time" can be expressed as:
Here Person(x) is interpreted to mean x is a person, Time(y) to mean that y is a moment of time, and
Canfool(x,y) to mean that (person) x can be fooled at (time) y.
The range of the quantifiers is the set of objects that can be used to satisfy them. (In the informal
examples in this section, the range of the quantifiers was left unspecified.) In addition to specifying
the meaning of predicate symbols such as Person and Time, an interpretation must specify a
nonempty set, known as the domain of discourse or universe, as a range for the quantifiers. Thus a
statement of the form is said to be true, under a particular interpretation, if there is
some object in the domain of discourse of that interpretation that satisfies the predicate that the
interpretation uses to assign meaning to the symbol Phil.
Syntax
There are two key parts of first order logic.
1. The Syntax
2. The Semantics
The syntax determines which collections of symbols are legal expressions in first-order logic, while
the semantics determine the meanings behind these expressions.
Logical symbols
There are several logical symbols in the alphabet, which vary by author but usually include:
The quantifier symbols _ and _.
The logical connectives: for conjunction, _for disjunction, _for implication, _for
biconditional, ¬for negation. Occasionally other logical connective symbols are included.
Some authors use _and _instead of and , especially in contexts where is used for
other purposes. Moreover, , tilde (~) and & may replace , and , especially if these
symbols are not available for technical reasons.
78
Parentheses, brackets, and other punctuation symbols. The choice of such symbols varies
depending on context.
An infinite set of variables, often denoted by lowercase letters at the end of the alphabet x,
y, z, … . Subscripts are often used to distinguish variables: x0, x1, x2, … .
An equality symbol (sometimes, identity symbol) =; see the section on equality below.
Though it should be noted that not all of these symbols are required - only one of the quantifiers,
negation and conjunction, variables, brackets and equality suffice. There are numerous minor
variations that may define additional logical symbols:
Sometimes the truth constants T or for "true" and F or for "false" are included. Without
any such logical operators of valence 0, these two constants can only be expressed using
quantifiers.
Sometimes additional logical connectives, such as the Sheffer stroke (NAND) and
‘exclusive or’ operators are included.
Non-logical symbols
The non-logical symbols represent predicates (relations), functions and constants on the domain of
discourse. It used to be standard practice to use a fixed, infinite set of non-logical symbols for all
purposes. A more recent practice is to use different non-logical symbols according to the
application one has in mind. Therefore it has become necessary to name the set of all non-logical
symbols used in a particular application. This choice is made via a signature.
The traditional approach is to have only one, infinite, set of non-logical symbols (one signature) for
all applications. Consequently, under the traditional approach there is only one language of firstorder
logic. This approach is still common, especially in philosophically oriented books.
1. For every integer n i 0 there is a collection of n-ary, or n-place, predicate symbols. Because
they represent relations between n elements, they are also called relation symbols. For each
arity n we have an infinite supply of them:
Pn
0, Pn
1, Pn
2, Pn
3, …
2. For every integer n i 0 there are infinitely many n-ary function symbols:
fn
0, f n
1, f n
2, f n
3, …
In contemporary mathematical logic, the signature varies by application. Typical signatures in
mathematics are {1, ×} or just {×} for groups, or {0, 1, +, ×, <} for ordered fields. There are no
restrictions on the number of non-logical symbols. The signature can be empty, finite, or infinite,
even uncountable. Uncountable signatures occur for example in modern proofs of the Löwenheim-
Skolem theorem.
In this approach, every non-logical symbol is of one of the following types.
1. A predicate symbol (or relation symbol) with some valence (or arity, number of arguments)
greater than or equal to 0. These which are often denoted by uppercase letters P, Q, R,... .
o Relations of valence 0 can be identified with propositional variables. For example,
P, which can stand for any statement.
79
o For example, P(x) is a predicate variable of valence 1. One possible interpretation is
“x is a man”.
o Q(x,y) is a predicate variable of valence 2. Possible interpretations include “x is
greater than y” and “x is the father of y”.
2. A function symbol, with some valence greater than or equal to 0. These are often denoted
by lowercase letters f, g, h,... .
o Examples: f(x) may be interpreted as for “the father of x”. In arithmetic, it may stand
for ‘-x’. In set theory, it may stand for “the power set of x”. In arithmetic, g(x,y) may
stand for ‘x+y’. In set theory, it may stand for “the union of x and y”.
o Function symbols of valence 0 are called constant symbols, and are often denoted by
lowercase letters at the beginning of the alphabet a, b, c,... . The symbol a may stand
for Socrates. In arithmetic, it may stand for 0. In set theory, such a constant may
stand for the empty set.
The traditional approach can be recovered in the modern approach by simply specifying the
‘custom’ signature to consist of the traditional sequences of non-logical symbols.
Formation Rules
The formation rules define the terms and formulas of first order logic. When terms and formulas are
represented as strings of symbols, these rules can be used to write a formal grammar for terms and
formulas. These rules are generally context-free (each production has a single symbol on the left
side), except that the set of symbols may be allowed to be infinite and there may be many start
symbols, for example the variables in the case of terms.
Terms
The set of terms is inductively defined by the following rules:
1. Variables. Any variable is a term.
2. Functions. Any expression f(t1,...,tn) of n arguments (where each argument ti is a term and f
is a function symbol of valence n) is a term.
Only expressions which can be obtained by finitely many applications of rules 1 and 2 are terms.
For example, no expression involving a predicate symbol is a term.
Formulas
The set of formulas (also called well-formed formulas or wffs) is inductively defined by the
following rules:
1. Predicate symbols. If P is an n-ary predicate symbol and t1, ..., tn are terms then P(t1,...,tn) is
a formula.
2. Equality. If the equality symbol is considered part of logic, and t1 and t2 are terms, then t1 =
t2 is a formula.
3. Negation. If j is a formula, then j is a formula.
4. Binary connectives. If j and k are formulas, then (j k) is a formula. Similar rules apply
to other binary logical connectives.
5. Quantifiers. If j is a formula and x is a variable, then and are formulas.
80
Only expressions which can be obtained by finitely many applications of rules 1–5 are formulas.
The formulas obtained from the first two rules are said to be atomic formulas.
For example,
is a formula, if f is a unary function symbol, P a unary predicate symbol, and Q a ternary predicate
symbol. On the other hand, is not a formula, although it is a string of symbols from the
alphabet.
The role of the parentheses in the definition is to ensure that any formula can only be obtained in
one way by following the inductive definition (in other words, there is a unique parse tree for each
formula). This property is known as unique readability of formulas. There are many conventions
for where parentheses are used in formulas. For example, some authors use colons or full stops
instead of parentheses, or change the places in which parentheses are inserted. Each author’s
particular definition must be accompanied by a proof of unique readability.
Notational conventions
For convenience, conventions have been developed about the precedence of the logical operators,
to avoid the need to write parentheses in some cases. These rules are similar to the order of
operations in arithmetic. A common convention is:
is evaluated first
and are evaluated next
Quantifiers are evaluated next
is evaluated last.
Moreover, extra punctuation not required by the definition may be inserted to make formulas easier
to read. Thus the formula
might be written as
In some fields, it is common to use infix notation for binary relations and functions, instead of the
prefix notation defined above. For example, in arithmetic, one typically writes "2 + 2 = 4" instead
of "=(+(2,2),4)". It is common to regard formulas in infix notation as abbreviations for the
corresponding formulas in prefix notation.
The definitions above use infix notation for binary connectives such as . A less common
convention is Polish notation, in which one writes , , and so on in front of their arguments
rather than between them. This convention allows all punctuation symbols to be discarded. Polish
notation is compact and elegant, but rarely used in practice because it is hard for humans to read it.
In Polish notation, the formula
81
becomes "_x_yePfx¬ePxQfyxz".
Example
In mathematics the language of ordered abelian groups has one constant symbol 0, one unary
function symbol a, one binary function symbol +, and one binary relation symbol m. Then:
The expressions +(x, y) and +(x, +(y, a(z))) are terms. These are usually written as x + y and
x + y a z.
The expressions +(x, y) = 0 and m(+(x, +(y, a(z))), +(x, y)) are atomic formulas.
These are usually written as x + y = 0 and x + y a z m x + y.
The expression is a formula, which is
usually written as
Free and Bound Variables
In a formula, a variable may occur free or bound. Intuitively, a variable is free in a formula if it is
not quantified: in , variable x is free while y is bound. The free and bound variables of
a formula are defined inductively as follows.
1. Atomic formulas. If j is an atomic formula then x is free in j if and only if x occurs in j,
and x is never bound in j.
2. Negation. x is free in j if and only if x is free in j. x is bound in j if and only if x is
bound in j.
3. Binary connectives. x is free in (j k) if and only if x is free in either j or k. x is bound in
(j k) if and only if x is bound in either j or k. The same rule applies to any other binary
connective in place of .
4. Quantifiers. x is free in y j if and only if x is free in j and x is a different symbol than y.
Also, x is bound in y j if and only if x is y or x is bound in j. The same rule holds with
in place of .
For example, in x y (P(x) Q(x,f(x),z)), x and y are bound variables, z is a free variable, and w
is neither because it does not occur in the formula.
Freeness and boundness can be also specialized to specific occurrences of variables in a formula.
For example, in , the first occurrence of x is free while the second is bound. In
other words, the x in P(x) is free while the x in is bound.
A formula with no free variables is called a sentence. These are the formulas that will have welldefined
truth values under an interpretation. For example, whether a formula such as Phil(x) is true
must depend on what x represents. But the sentence will be either true or false in a
given interpretation.
Semantics
An interpretation of a first-order language assigns a denotation to all non-logical constants in that
language. It also determines a domain of discourse that specifies the range of the quantifiers. The
result is that each term is assigned an object that it represents, and each sentence is assigned a truth
82
value. In this way, an interpretation provides semantic meaning to the terms and formulas of the
language. The study of the interpretations of formal languages is called formal semantics.
The domain of discourse D is a nonempty set of "objects" of some kind. Intuitively, a first-order
formula is a statement about these objects; for example, states the existence of an object x
such that the predicate P is true where referred to it. The domain of discourse is the set of
considered objects. For example, one can take D to be the set of integer numbers.
The interpretation of a function symbol is a function. For example, if the domain of discourse
consists of integers, a function symbol f of arity 2 can be interpreted as the function that gives the
sum of its arguments. In other words, the symbol f is associated with the function I(f) which, in this
interpretation, is addition.
The interpretation of a constant symbol is a function from the one-element set D0 to D, which can
be simply identified with an object in D. For example, an interpretation may assign the value I(c)
= 10 to the constant symbol c.
The interpretation of an n-ary predicate symbol is a set of n-tuples of elements of the domain of
discourse. This means that, given an interpretation, a predicate symbol, and n elements of the
domain of discourse, one can tell whether the predicate is true of those elements according to the
given interpretation. For example, an interpretation I(P) of a binary predicate symbol P may be the
set of pairs of integers such that the first one is less than the second. According to this
interpretation, the predicate P would be true if its first argument is less than the second.
3.5 Deductive Systems
A deductive system is used to demonstrate, on a purely syntactic basis, that one formula is a logical
consequence of another formula. There are many such systems for first-order logic, including
Hilbert-style deductive systems, natural deduction, the sequent calculus, the tableaux method, and
resolution. These share the common property that a deduction is a finite syntactic object; the format
of this object, and the way it is constructed, vary widely. These finite deductions themselves are
often called derivations in proof theory. They are also often called proofs, but are completely
formalized unlike natural-language mathematical proofs.
A deductive system is sound if any formula that can be derived in the system is logically valid.
Conversely, a deductive system is complete if every logically valid formula is derivable. They also
share the property that it is possible to effectively verify that a purportedly valid deduction is
actually a deduction; such deduction systems are called effective.
A key property of deductive systems is that they are purely syntactic, so that derivations can be
verified without considering any interpretation. Thus a sound argument is correct in every possible
interpretation of the language, regardless whether that interpretation is about mathematics,
economics, or some other area.
In general, logical consequence in first-order logic is only semidecidable: if a sentence A logically
implies a sentence B then this can be discovered (for example, by searching for a proof until one is
found, using some effective, sound, complete proof system). However, if A does not logically
imply B, this does not mean that A logically implies the negation of B. There is no effective
procedure that, given formulas A and B, always correctly decides whether A logically implies B.
83
Rules of Inference
A rule of inference states that, given a particular formula (or set of formulas) with a certain
property as a hypothesis, another specific formula (or set of formulas) can be derived as a
conclusion. The rule is sound (or truth-preserving) if it preserves validity in the sense that whenever
any interpretation satisfies the hypothesis, that interpretation also satisfies the conclusion.
For example, one common rule of inference is the rule of substitution. If t is a term and j is a
formula possibly containing the variable x, then j[t/x] (often denoted j[x/t]) is the result of
replacing all free instances of x by t in j. The substitution rule states that for any j and any term t,
one can conclude j[t/x] from j provided that no free variable of t becomes bound during the
substitution process. (If some free variable of t becomes bound, then to substitute t for x it is first
necessary to change the bound variables of j to differ from the free variables of t.)
To see why the restriction on bound variables is necessary, consider the logically valid formula j
given by , in the signature of (0,1,+, ×,=) of arithmetic. If t is the term "x + 1", the
formula j[t/y] is , which will be false in many interpretations. The problem is
that the free variable x of t became bound during the substitution. The intended replacement can be
obtained by renaming the bound variable x of j to something else, say z, so that the formula after
substitution is , which is again logically valid.
The substitution rule demonstrates several common aspects of rules of inference. It is entirely
syntactical; one can tell whether it was correctly applied without appeal to any interpretation. It has
(syntactically-defined) limitations on when it can be applied, which must be respected to preserve
the correctness of derivations. Moreover, as is often the case, these limitations are necessary
because of interactions between free and bound variables that occur during syntactic manipulations
of the formulas involved in the inference rule.
Hilbert-Style Systems and Natural Deduction
A deduction in a Hilbert-style deductive system is a list of formulas, each of which is a logical
axiom, a hypothesis that has been assumed for the derivation at hand, or follows from previous
formulas via a rule of inference. The logical axioms consist of several axiom schemes of logically
valid formulas; these encompass a significant amount of propositional logic. The rules of inference
enable the manipulation of quantifiers. Typical Hilbert-style systems have a small number of rules
of inference, along with several infinite schemes of logical axioms. It is common to have only
modus ponens and universal generalization as rules of inference.
Natural deduction systems resemble Hilbert-style systems in that a deduction is a finite list of
formulas. However, natural deduction systems have no logical axioms; they compensate by adding
additional rules of inference that can be used to manipulate the logical connectives in formulas in
the proof.
Sequent Calculus
The sequent calculus was developed to study the properties of natural deduction systems. Instead of
working with one formula at a time, it uses sequents, which are expressions of the form
84
where A1, ..., An, B1, ..., Bk are formulas and the turnstile symbol is used as punctuation to
separate the two halves. Intuitively, a sequent expresses the idea that implies
.
Tableaux Method
A tableau proof for the propositional formula ((a ı ~b) & b) e a.
Unlike the methods just described, the derivations in the tableaux method are not lists of formulas.
Instead, a derivation is a tree of formulas. To show that a formula A is provable, the tableaux
method attempts to demonstrate that the negation of A is unsatisfiable. The tree of the derivation
has at its root; the tree branches in a way that reflects the structure of the formula. For example,
to show that is unsatisfiable requires showing that C and D are each unsatisfiable; the
corresponds to a branching point in the tree with parent and children C and D.
Provable Identities
The following sentences can be called "identities" because the main connective in each is the
biconditional.
(where x must not occur free in P)
(where x must not occur free in P)
Restricted Languages
First-order logic can be studied in languages with fewer logical symbols than were described above.
85
Because can be expressed as , and can be expressed as
, either of the two quantifiers and can be dropped.
Since can be expressed as and can be expressed as
, either or can be dropped. In other words, it is sufficient to have and
, or and , as the only logical connectives.
Similarly, it is sufficient to have only and as logical connectives, or to have only the
Sheffer stroke (NAND) or the NOR operator.
It is possible to entirely avoid function symbols and constant symbols, rewriting them via
predicate symbols in a appropriate way. For example, instead of using a constant symbol
one may use a predicate (interpreted as ), and replace every predicate such as
with . A function such as f(x1,x2,...,xn) will similarly
be replaced by a predicate F(x1,x2,...,xn,y) interpreted as y = f(x1,x2,...,xn). This change
requires adding additional axioms to the theory at hand, so that interpretations of the the
predicate symbols used have the correct semantics.
Restrictions such as these are useful as a technique to reduce the number of inference rules or
axiom schemes in deductive systems, which leads to shorter proofs of metalogical results. The cost
of the restrictions is that it becomes more difficult to express natural-language statements in the
formal system at hand, because the logical connectives used in the natural language statements
must be replaced by their (longer) definitions in terms of the restricted collection of logical
connectives. Similarly, derivations in the limited systems may be longer than derivations in systems
that include additional connectives. There is thus a trade-off between the ease of working within the
formal system and the ease of proving results about the formal system.
It is also possible to restrict the arities of function symbols and predicate symbols, in sufficiently
expressive theories. One can in principle dispense entirely with functions of arity greater than 2 and
predicates of arity greater than 1 in theories that include a pairing function. This is a function of
arity 2 that takes pairs of elements of the domain and returns an ordered pair containing them. It is
also sufficient to have two predicate symbols of arity 2 that define projection functions from an
ordered pair to its components. In either case it is necessary that the natural axioms for a pairing
function and its projections are satisfied.
Additional Quantifiers
Additional quantifiers can be added to first-order logic.
Sometimes it is useful to say that "P(x) holds for exactly one x", which can be expressed as
x P(x). This notation, called uniqueness quantification, may be taken to abbreviate a
formula such as x (P(x) y (P(y) (x = y))).
First-order logic with extra quantifiers has new quantifiers Qx,..., with meanings such as
"there are many x such that ...". Also see branching quantifiers and the plural quantifiers of
George Boolos and others.
Bounded quantifiers are often used in the study of set theory or arithmetic.
Non-classical and modal logics
Intuitionisticfirst-order logic uses intuitionistic rather than classical propositional calculus;
for example, ¬¬j need not be equivalent to j. Similarly, first-order fuzzy logics are firstorder
extensions of propositional fuzzy logics rather than classical logic.
86
Infinitary logic allows infinitely long sentences. For example, one may allow a conjunction
or disjunction of infinitely many formulas, or quantification over infinitely many variables.
Infinitely long sentences arise in areas of mathematics including topology and model theory.
First-order modal logic has extra modal operators with meanings which can be
characterised informally as, for example "it is necessary that j" and "it is possible that j".
3.6 Backward Chaining Method of Conclusion Deduction:
Suppose we were developing an application concerning genetically transmitted traits. Our
application might need several rules that guided its reasoning. One such rule might be, "if a person
has a trait and a cousin of that person has the same trait, then consider the possibility that the trait is
inherited." Such a rule might be coded as follows:
(defrule cousins-may-inherit-trait
(has ?grandchild-1 ?trait)
(parent ?grandchild-1 ?parent-1)
(parent ?parent-1 ?grandparent)
(parent ?parent-2 ?grandparent)
(parent ?grandchild-2 ?parent-2)
(has ?grandchild-2 ?trait)
=>
(assert (inherited (status possible) (trait
?trait)))
)
This is a fine rule when viewed in isolation. However, there are probably lots of rules in this
application that examine conditions across siblings. All of these rules will share conditions similar
to:
(parent ?parent-1 ?grandparent)
(parent ?parent-2 ?grandparent)
This amounts to a low-level encoding of the notion of a sibling. The following conditions amount
to a low-level encoding of the notion of a cousin:
(parent ?grandchild-1 ?parent-1)
(parent ?parent-1 ?grandparent)
(parent ?parent-2 ?grandparent)
(parent ?grandchild-2 ?parent-2)
In an application of hundreds of rules that consider blood relationships in many different ways and
combinations, having notions of "sibling" and "cousin" available as simple relationships rather than
as more complex pattern matching operations not only makes the rules more perspicuous, itmakes
them more reliable and easier to maintain. As an example, the above rule could be recoded as:
(defrule cousins-may-inherit-trait
(has ?x ?trait)
(cousin ?x ?y)
(has ?y ?trait)
=>
(assert (inherited (status possible) (trait
87
?trait)))
)(
defrule cousin
(parent ?x ?parent-1)
(sibling ?parent-1 ?parent-2)
(parent ?y ?parent-2)
=>
(assert (cousin ?x ?y))
)
(defrule sibling
(parent ?x ?parent)
(parent ?y&~?x ?parent)
=>
(assert (sibling ?x ?y))
)
Deduction using Forward Chaining
The cousin and sibling rules above make the high-level semantics of cousin and sibling explicit
while in the original rule they were implicit. With these relations made explicit, coding of all rules
that consider these relations can use a single pattern rather than its corresponding, implicit,
constituent patterns.
Reducing the number of patterns per rule clearly improves the reliability of those rules. Also,
maintaining rules that use the more abstract patterns is simplified since only the rules that maintain
the relevant relation need to be modified. Furthermore, if a relation can be deduced by any of
several methods (i.e., disjunction occurs) then the number of rules is reduced with resulting
improvements in performance and reliability. Finally, using relations rather than unshared joins
over several patterns can dramatically improve performance and reduce space requirements.
The problem with the above sibling and cousin rules is that they will assert every cousin and sibling
relationship that exists given a set of parent relationships. The number of these deduced
relationships can become very large, especially for the cousin relationship. This is a fundamental
problem. For most domains, there are at least an infinite number of irrelevant truths that can be
deduced. The challenge in building a rational problem solving system is to actually deduce truths
that are (or have a good chance of being) relevant to the problem at hand.
Deduction using Backward Chaining
Focusing deduction such that it furthers problem solving, rather than merely deducing irrelevant
truths, is often done by generating subgoals during problem solving. Goals are generated as the
conditions of rules are checked. These goals then trigger the checking of rules that might deduce
facts that would further the matching of the rule which generated the goal.
To be more concrete, the cousin and sibling rules from above could be recoded as:
(defrule cousin
(goal (cousin ?x ?y))
(parent ?x ?p1)
(parent ?y ?p2)
(sibling ?p1 ?p2)
=>
(assert (cousin ?x ?y))
)
88
(defrule sibling
(goal (sibling ?x ?y))
(parent ?x ?parent)
(parent ?y ?parent)
=>
(assert (sibling ?x ?y))
)
In these rules, one of the goal conditions is triggered when a goal to establish a cousin or sibling
relationship is generated.
The actions of these rules assert facts which satisfy the goals, thereby deducing only facts which
might further the matching
of the rules which led to the goals’ generation. We call the above rules data-driven backward
chaining rules. Of course, for these rules to be driven some goal data is required. Either other rules
or the inference engine architecture itself must assert these goals. In either case, goals must be
generated as if by the following rules:
(defrule
cousins-may-inherit-trait-goal-generation-1
(has ?x ?trait)
=>
(assert (goal (cousin ?x ?y)))
)(
defrule cousin-goal-generation-1
(goal (cousin ?x ?y))
(parent ?x ?p1)
(parent ?y&~?x ?p2)
=>
(assert (goal (sibling ?p1 ?p2)))
)
Manual Goal Generation
The above goal generation rules, if they could be implemented, would correctly generate the goals
required to implement the explicit cousin and sibling relations using the previously mentioned datadriven
backward chaining rules. However, beyond the need for an adequate representation for
goals, the manual coding of goal generation rules would remain problematic.
In general, for a rule of N conditions, N+ 1 rule will be needed to implement those rules such that
they can support sound and complete reasoning. The original rule which matches in the standard,
data-driven, forward chaining manner is, of course, required. An additional rule per condition is
needed to assert the goals that will allow backward chained inference to deduce facts that will
further the matching of the original rule.
Clearly, multiplying the number of rules required by one plus the average number of goal
generating conditions per rule is unacceptable. Even if the effort is made, it is extremely error
prone. Even automating the maintenance of goal generation rules would increase space and time
requirements significantly, just to encode the actions and names of the rules and to activate the
rules and interpret their actions.
Representing Goals
Even though manual coding of goal generation is impractical, CLIPS, OPS5, and many other
production system languages are unable to implement the above rules for several even more
89
fundamental reasons. The most obvious reason is that they provide no capability for distinguishing
facts from goals. Moreover, these systems provide no means of representing unspecified values (for
unbound variables) that occur in the conditions for which they might otherwise assert goals. For
example, the following goal generation rule, in which the variable ?y is unbound, cannot even be
simulated without explicit support for goals which include universally quantified values:
(defrule
cousins-may-inherit-trait-goal-generation-1
(has ?x ?trait)
=>
(assert (goal (cousin ?x ?y)))
)
Even supporting universally quantified values within goals is not enough to support backward
chaining, however. If the variable ?y in the first condition of the following rule matches a literal
value, CLIPS or OPS5 extended to support goal generation could function properly. If, however,
‘?y’ matches a universally quantified value, then neither CLIPS or OPS5 could join that unbound
variable with any parent fact corresponding to the third condition, as would be logically required.
(defrule cousin-goal-generation-1
(goal (cousin ?x ?y))
(parent ?x ?p1)
(parent ?y&~?x ?p2)
=>
(assert (goal (sibling ?p1 ?p2)))
)
Clearly these systems are unable, not only to generate goals in the first place, but also to join those
goals with facts.
3.7 Modus Ponens
In classical logic, modus ponendo ponens (Latin for mode that affirms by affirming; often
abbreviated to MP or modus ponens) is a valid, simple argument form sometimes referred to as
affirming the antecedent or the law of detachment. It is closely related to another valid form of
argument, modus tollens or “denying the consequent”.
Modus ponens is a very common rule of inference, and takes the following form:
If P, then Q.
P.
Therefore, Q.
Formal Notation
The modus ponens rule may be written in sequent notation:
90
or in rule form:
The argument form has two premises. The first premise is the IF - THEN or conditional claim,
namely that P implies Q. The second premise is that P, the antecedent of the conditional claim, is
true. From these two premises it can be logically concluded that Q, the consequent of the
conditional claim, must be true as well. In Artificial Intelligence, modus ponens is often called
forward chaining.
An example of an argument that fits the form modus ponens:
If today is Tuesday, then I will go to work.
Today is Tuesday.
Therefore, I will go to work.
This argument is valid, but this has no bearing on whether any of the statements in the argument are
true; for modus ponens to be a sound argument, the premises must be true for any true instances of
the conclusion. An argument can be valid but nonetheless unsound if one or more premises are
false; if an argument is valid and all the premises are true, then the argument is sound. For example,
I might be going to work on Wednesday. In this case, the reasoning for my going to work (because
it is Tuesday) is unsound. The argument is only sound on Tuesdays (when I go to work), but valid
on every day of the week. A propositional argument using modus ponens is said to be deductive.
In single-conclusion sequent calculi, modus ponens is the Cut rule. The cut-elimination theorem for
a calculus says that every proof involving Cut can be transformed (generally, by a constructive
method) into a proof without Cut, and hence that Cut is admissible.
The Curry-Howard correspondence between proofs and programs relates modus ponens to function
application: if f is a function of type P e Q and x is of type P, then f x is of type Q.
Justification via Truth Table
The validity of modus ponens in classical two-valued logic can be clearly demonstrated by use of a
truth table.
p q p eq
TTT
TFF
FTT
FFT
91
In instances of modus ponens we assume as premises that p e q is true and p is true. Only one line
of the truth table - the first - satisfies these two conditions. On this line, q is also true. Therefore,
whenever p e q is true and p is true, q must also be true.
3.8 Unification
A unification of two terms is a join (in the lattice sense) with respect to a specialisation order. That
is, we suppose a preorder on a set of terms, for which t* m t means that t* is obtained from t by
substituting some term(s) for one or more free variables in t. The unification u of s and t, if it exists,
is a term that is a substitution instance of both s and t. If any common substitution instance of s and
t is also an instance of u, u is called minimal unification.
For example, with polynomials, X 2 and Y 3 can be unified to Z6 by taking X = Z3 and Y = Z2.
Definition of Unification for First-Order Logic
Let p and q be sentences in first-order logic.
UNIFY(p,q) = U where subst(U,p) = subst(U,q)
Where subst(U,p) means the result of applying substitution U on the sentence p. Then U is called a
unifier for p and q. The unification of p and q is the result of applying U to both of them.
Let L be a set of sentences, for example, L = {p,q}. A unifier U is called a most general unifier for
L if, for all unifiers U' of L, there exists a substitution s such that applying s to the result of
applying U to L gives the same result as applying U' to L:
subst(U',L) = subst(s,subst(U,L)).
Unification in Logic Programming and Type Theory
The concept of unification is one of the main ideas behind logic programming, best known through
the language Prolog. It represents the mechanism of binding the contents of variables and can be
viewed as a kind of one-time assignment. In Prolog, this operation is denoted by the equality
symbol =, but is also done when instantiating variables (see below). It is also used in other
languages by the use of the equality symbol =, but also in conjunction with many operations
including +, -, *, /. Type inference algorithms are typically based on unification.
In Prolog:
1. A variable which is uninstantiated—i.e. no previous unifications were performed on it—can
be unified with an atom, a term, or another uninstantiated variable, thus effectively
becoming its alias. In many modern Prolog dialects and in first-order logic, a variable
cannot be unified with a term that contains it; this is the so called “occurs check”.
2. Two atoms can only be unified if they are identical.
3. Similarly, a term can be unified with another term if the top function symbols and arities of
the terms are identical and if the parameters can be unified simultaneously. Note that this is
a recursive behavior.
92
In type theory, the analogous statements are:
1. Any type variable unifies with any type expression, and is instantiated to that expression. A
specific theory might restrict this rule with an occurs check.
2. Two type constants unify only if they are the same type.
3. Two type constructions unify only if they are applications of the same type constructor and
all of their component types recursively unify.
Due to its declarative nature, the order in a sequence of unifications is (usually) unimportant.
French computer scientist Gérard Huet gave an algorithm for unification in simply typed lambda
calculus in 1973. There have been many developments in unification theory since then.
Higher-Order Unification
One of the most influential theories of ellipsis is that ellipses are represented by free variables
whose values are then determined using Higher-Order Unification (HOU). For instance, the
semantic representation of Jon likes Mary and Peter does too is like(j; m)R(p) and the value of R
(the semantic representation of the ellipsis) is determined by the equation like(j; m) = R(j). The
process of solving such equations is called Higher-Order Unification.
Examples of Unification
In the convention of Prolog, atoms begin with lowercase letters.
A, A : Succeeds (tautology)
A, B, abc : Both A and B are unified with the atom abc
abc, B, A : As above (unification is symmetric)
abc, abc : Unification succeeds
abc, xyz : Fails to unify because the atoms are different
f(A), f(B) : A is unified with B
f(A), g(B) : Fails because the heads of the terms are different
f(A), f(B, C) : Fails to unify because the terms have different arity
f(g(A)), f(B) : Unifies B with the term g(A)
f(g(A), A), f(B, xyz) : Unifies A with the atom xyz and B with the term g(xyz)
A, f(A) : Infinite unification, A is unified with f(f(f(f(...)))). In proper first-order
logic and
many modern Prolog dialects this is forbidden (and enforced by the “occurs check”)
A, abc, xyz, X: Fails to unify; effectively abc = xyz
3.9 Skolem Normal Form
Skolemization is a method for removing existential quantifiers from formal logic statements, often
performed as the first step in an automated theorem prover. A formula of first-order logic is in
Skolem normal form (named after Thoralf Skolem) if it is in conjunctive prenex normal form with
only universal first-order quantifiers. Every first-order formula can be converted into Skolem
normal form while not changing its satisfiability via a process called Skolemization. The resulting
formula is not necessarily equivalent to the original one, but is equisatisfiable with it: it is
satisfiable if and only if the original one is.
93
The simplest form of Skolemization is for existentially quantified variables which are not inside the
scope of a universal quantifier. These can simply be replaced by creating new constants. For
example, can be changed to P(c), where c is a new constant.
More generally, Skolemization is performed by replacing every existentially quantified variable y
with a term whose function symbol f is new (does not occur anywhere else in the
formula.) The variables of this term are as follows. If the formula is in prenex normal form,
are the variables that are universally quantified and whose quantifiers precede that of
y. In general, they are the variables that are universally quantified and such that occurs in the
scope of their quantifiers. The function f introduced in this process is called a Skolem function (or
Skolem constant if it is of zero arity) and the term is called a Skolem term.
As an example, the formula is not in Skolem normal form because it
contains the existential quantifier . Skolemization replaces y with f(x), where f is a new
function symbol, and removes the quantification over y. The resulting formula is
. The Skolem term f(x) contains x but not z because the quantifier to be
removed is in the scope of but not in that of ; since this formula is in prenex normal form,
this is equivalent to saying that, in the list of quantifers, x precedes y while z does not. The formula
obtained by this transformation is satisfiable if and only if the original formula is.
How Skolemization works
Skolemization works by applying a second-order equivalence in conjunction to the definition of
first-order satisfiability. The equivalence provides a way for "moving" an existential quantifier
before a universal one.
where
f(x) is a function that maps x to y.
Intuitively, the sentence "for every x there exists a y such that R(x,y)" is converted into the
equivalent form "there exists a function f mapping every x into a y such that, for every x it holds
R(x,f(x))".
This equivalence is useful because the definition of first-order satisfiability implicitly existentially
quantifies over the evaluation of function symbols. In particular, a first-order formula o is
satisfiable if there exists a model Mand an evaluation μ of the free variables of the formula that
evaluate the formula to true. The model contains the evaluation of all function symbols; therefore,
Skolem functions are implicitly existentially quantified. In the example above, is
satisfiable if and only if there exists a model M, which contains an evaluation for f, such that
is true for some evaluation of its free variables (none in this case). This can be
expressed in second order as . By the above equivalence, this is the same as
the satisfiability of .
94
At the meta-level, first-order satisfiability of a formula o can be written with a little abuse of
notation as , where Mis a model and μ is an evaluation of the free
variables. Since first-order models contain the evaluation of all function symbols, any Skolem
function o contains is implicitly existentially quantified by . As a result, after replacing an
existential quantifier over variables into an existential quantifiers over functions at the front of the
formula, the formula can still be treated as a first-order one by removing these existential
quantifiers. This final step of treating as can be done
because functions are implicitly existentially quantified by in the definition of first-order
satisfiability.
Correctness of Skolemization can be shown on the example formula
as follows. This formula is satisfied by a model Mif
and only if, for each possible value for in the domain of the model there exists a value
for y in the domain of the model that makes true. By the axiom of choice, there
exists a function f such that . As a result, the formula
is satisfiable, because it has the model
obtained by adding the evaluation of f to M. This shows that F1 is satisfiable only if F2 is
satisfiable as well. In the other way around, if F2 is satisfiable, then there exists a model M' that
satisfies it; this model includes an evaluation for the function f such that, for every value of
, the formula holds. As a result, F1 is satisfied by
the same model because one can choose, for every value of , the value
, where f is evaluated according to M'.
Uses of Skolemization
One of the uses of Skolemization is automated theorem proving. For example, in the method of
analytic tableaux, whenever a formula whose leading quantifier is existential occurs, the formula
obtained by removing that quantifier via Skolemization can be generated. For example, if
occurs in a tableau, where are the free variables of
, then can be added to the same branch of the
tableau. This addition does not alter the satisfiability of the tableau: every model of the old formula
can be extended, by adding it a suitable evaluation of f, to a model of the new formula.
This form of Skolemization is actually an improvement over "classical" Skolemization in that only
variables that are free in the formula are placed in the Skolem term. This is an improvement
because the semantics of tableau may implicitly place the formula in the scope of some universally
quantified variables that are not in the formula itself; these variables are not in the Skolem term,
while they would be there according to the original definition of Skolemization. Another
improvement that can be used is using the same Skolem function symbol for formulae that are
identical up to variable renaming.
3.10 Clausal Normal Form
The clausal normal form (or clause normal form, conjunctive normal form, CNF) of a logical
formula is used in logic programming and many theorem proving systems. A formula in clause
normal form is a set of clauses, interpreted as a conjunction. A clause is an implicitly universally
95
quantified set of literals, interpreted as a disjunction. If variant clauses are identified, the clause
normal form (in set representation) of a formula is unique.
Conversion to clausal normal form
The procedure to convert a formula into clausal form can destroy the structure of the formula, and
naive translations often causes exponential blowup in the size of the resulting formula.
The procedure begins with any formula of classical first-order logic:
1. Put the formula into negation normal form.
2. Standardize variables
becomes , where c is new
3. Skolemize -- replace existential variables with Skolem constants or Skolem functions of
universal variables, from the outside inward. Make the following replacements:
becomes , where fc is new
4. Remove the universal quantifiers.
5. Put the formula into conjunctive normal form.
6. Replace with . Each conjunct is of the form
, which is equivalent to
.
o If m=0 and n=1, this is a Prolog fact.
o If m>0 and n=1, this is a Prolog rule.
o If m>0 and n=0, this is a Prolog query.
7. Finally, replace each conjunct with
.
When n = 1, the logic is called Horn clause logic and is equivalent in computational power to a
universal Turing machine.
Often it is sufficient to generate an equisatisfiable (not an equivalent) CNF for a formula. In this
case, the worst-case exponential blow-up can be avoided by introducing definitions and using them
to rename parts of the formula.
3.11 Resolution
Resolution is a rule of inference leading to a refutation theorem-proving technique for sentences in
propositional logic and first-order logic. In other words, iteratively applying the resolution rule in a
suitable way allows for telling whether a propositional formula is satisfiable and for proving that a
first-order formula is unsatisfiable; this method may prove the satisfiability of a first-order
satisfiable formula, but not always, as it is the case for all methods for first-order logic. Resolution
was introduced by John Alan Robinson in 1965.
Resolution rule
96
The resolution rule in propositional logic is a single valid inference rule that produces a new clause
implied by two clauses containing complementary literals. A literal is a propositional variable or
the negation of a propositional variable. Two literals are said to be complements if one is the
negation of the other (in the following, ai is taken to be the complement to bj). The resulting clause
contains all the literals that do not have complements. Formally:
where
all as and bs are literals,
ai is the complement to bj, and
the dividing line stands for entails
The clause produced by the resolution rule is called the resolvent of the two input clauses.
When the two clauses contain more than one pair of complementary literals, the resolution rule can
be applied (independently) for each such pair. However, only the pair of literals that are resolved
upon can be removed: all other pair of literals remain in the resolvent clause.
The resolution rule is similar in spirit to the cut rule of sequent calculus.
A Resolution Technique
When coupled with a complete search algorithm, the resolution rule yields a sound and complete
algorithm for deciding the satisfiability of a propositional formula, and, by extension, the validity of
a sentence under a set of axioms.
This resolution technique uses proof by contradiction and is based on the fact that any sentence in
propositional logic can be transformed into an equivalent sentence in conjunctive normal form. The
steps are as follows:
All sentences in the knowledge base and the negation of the sentence to be proved (the
conjecture) are conjunctively connected.
The resulting sentence is transformed into a conjunctive normal form with the conjuncts
viewed as elements in a set, S, of clauses.
For example would give rise to a set
.
The resolution rule is applied to all possible pairs of clauses that contain complementary
literals. After each application of the resolution rule, the resulting sentence is simplified by
removing repeated literals. If the sentence contains complementary literals, it is discarded
(as a tautology). If not, and if it is not yet present in the clause set S, it is added to S, and is
considered for further resolution inferences.
If after applying a resolution rule the empty clause is derived, the complete formula is
unsatisfiable (or contradictory), and hence it can be concluded that the initial conjecture
follows from the axioms.
97
If, on the other hand, the empty clause cannot be derived, and the resolution rule cannot be
applied to derive any more new clauses, the conjecture is not a theorem of the original
knowledge base.
One instance of this algorithm is the original Davis–Putnam algorithm that was later refined into
the DPLL algorithm that removed the need for explicit representation of the resolvents.
This description of the resolution technique uses a set S as the underlying data-structure to represent
resolution derivations. Lists, Trees and Directed Acyclic Graphs are other possible and common
alternatives. Tree representations are more faithful to the fact that the resolution rule is binary.
Together with a sequent notation for clauses, a tree representation also makes it clear to see how the
resolution rule is related to a special case of the cut-rule, restricted to atomic cut-formulas.
However, tree representations are not as compact as set or list representations, because they
explicitly show redundant subderivations of clauses that are used more than once in the derivation
of the empty clause. Graph representations can be as compact in the number of clauses as list
representations and they also store structural information regarding which clauses were resolved to
derive each resolvent.
A simple example
In English: if a or b is true, and a is false or c is true, then either b or c is true.
If a is true, then for the second premise to hold, c must be true. If a is false, then for the first
premise to hold, b must be true.
So regardless of a, if both premises hold, then b or c is true.
Resolution in First Order Logic
In first order logic, resolution condenses the traditional syllogisms of logical inference down to a
single rule.
To understand how resolution works, consider the following example syllogism of term logic:
All Greeks are Europeans.
Homer is a Greek.
Therefore, Homer is a European.
Or, more generally:
.
P(a).
Therefore, Q(a).
To recast the reasoning using the resolution technique, first the clauses must be converted to
conjunctive normal form. In this form, all quantification becomes implicit: universal quantifiers on
98
variables (X, Y, …) are simply omitted as understood, while existentially-quantified variables are
replaced by Skolem functions.
P(a)
Therefore, Q(a)
So the question is, how does the resolution technique derive the last clause from the first two? The
rule is simple:
Find two clauses containing the same predicate, where it is negated in one clause but not in
the other.
Perform a unification on the two predicates. (If the unification fails, you made a bad choice
of predicates. Go back to the previous step and try again.)
If any unbound variables which were bound in the unified predicates also occur in other
predicates in the two clauses, replace them with their bound values (terms) there as well.
Discard the unified predicates, and combine the remaining ones from the two clauses into a
new clause, also joined by the "ı" operator.
To apply this rule to the above example, we find the predicate P occurs in negated form
¬P(X)
in the first clause, and in non-negated form
P(a)
in the second clause. X is an unbound variable, while a is a bound value (atom). Unifying the two
produces the substitution
Xa
Discarding the unified predicates, and applying this substitution to the remaining predicates (just
Q(X), in this case), produces the conclusion:
Q(a)
99
Check Your Progress:
1. Explain in brief the differences between propositional logic and FOPL as knowledge
representation schemes. Show how will you represent the sentence "The sky is blue" these
two schemes
2. Given propositions P, Q and R use Truth table method to prove that
(P P _ Q) _ Q
(P _(Q _R)) _ (P _Q) (P _R)
3. Explain the five levels of knowledge representation
4. Discuss Traveling salesman problem
5. Briefly describe the following knowledge representation schemes along with their merits
and demerits:
(i) Predicate logic
(ii) Frames
6. Write short notes on:
(i) Conceptual Graph
(ii) Free and bound variables
100
Observed
Data
Rules
Change
Conditions
Chapter 4 Rule-Based System
4.1 Rule-Based Systems
Rule-based systems can be used to perform lexical analysis to compile or interpret computer
programs, or in natural language processing.
Rule-based systems are used as a way to store and manipulate knowledge to interpret information
in a useful way. They are often used in artificial intelligence applications and research.
A classic example of a rule-based system is the domain-specific expert system that uses rules to
make deductions or choices. For example, an expert system might help a doctor choose the correct
diagnosis based on a cluster of symptoms, or select tactical moves to play a game.
Rule-based programming attempts to derive execution instructions from a starting set of data and
rules, which is a more indirect method than using a programming language which lists execution
steps straightforwardly.
4.2 Components of Rule Based Systems:
A typical rule-based system has four basic components:
1. A list of rules or rule base, which is a specific type of knowledge base.
2. An inference engine or semantic reasoner, which infers information or takes action based on
the interaction of input and the rule base.
3. Temporary working memory.
4. A user interface or other connection to the outside world through which input and output
signals are received and sent.
Interpreter
Working
Memory
Rule Base
101
Working Memory:
Contains facts about the world
Can be observed directly or derived from a rule
Contains temporary knowledge – knowledge about this problem-solving session
May be modified by the rules.
Traditionally stored as a <object, attribute, value> triplet
Examples:
<CAR, COLOUR, RED>: “The colour of my car is red”
<TEMPERATURE, OVER, 20>: “The temperature is over 20 oC”
Using a set of assertions, which collectively form the ‘working memory’, and a set of rules that
specify how to act on the assertion set, a rule-based system can be created. Rule-based systems are
fairly simplistic, consisting of little more than a set of IF - THEN statements, but provide the basis
for so-called “expert systems” which are widely used in many fields. The concept of an expert
system is this: the knowledge of an expert is encoded into the rule set. When exposed to the same
data, the expert system AI will perform in a similar manner to the expert.
Rule-based systems are a relatively simple model that can be adapted to any number of problems.
As with any AI, a rule-based system has its strengths as well as limitations that must be considered
before deciding if it is the right technique to use for a given problem. Overall, rule-based systems
are really only feasible for problems for which any and all knowledge in the problem area can be
written in the form of if-then rules and for which this problem area is not large. If there are too
many rules, the system can become difficult to maintain and can suffer a performance hit.
To create a rule-based system for a given problem, we must have (or create) the following:
1. A set of facts to represent the initial working memory. This should be anything relevant to
the beginning state of the system.
2. A set of rules. This should encompass any and all actions that should be taken within the
scope of a problem, but nothing irrelevant. The number of rules in the system can affect its
performance, so you don’t want any that aren’t needed.
3. A condition that determines that a solution has been found or that none exists. This is
necessary to terminate some rule-based systems that find themselves in infinite loops
otherwise.
4.3 Technique of Rule-Based Systems
The rule-based system itself uses a simple technique: It starts with a set of rules i.e. a rule-base,
which contains all of the appropriate knowledge encoded into If-Then rules, and a working
memory, which may or may not initially contain any data, assertions or initially known
information. The system examines all the rule conditions (IF) and determines a subset, the conflict
set, of the rules whose conditions are satisfied based on the working memory. Of this conflict set,
102
one of those rules is triggered (fired). Which one is chosen is based on a conflict resolution
strategy. When the rule is fired, any actions specified in its THEN clause are carried out. These
actions can modify the working memory, the rule-base itself, or do just about anything else the
system programmer decides to include. This loop of firing rules and performing actions continues
until one of two conditions are met: there are no more rules whose conditions are satisfied or a rule
is fired whose action specifies the program should terminate.
Which rule is chosen to fire is a function of the conflict resolution strategy. Which strategy is
chosen can be determined by the problem or it may be a matter of preference. In any case, it is vital
as it controls which of the applicable rules are fired and thus how the entire system behaves. There
are several different strategies, the most common are as follows:
1. First Applicable: If the rules are in a specified order, firing the first applicable one allows
control over the order in which rules fire. This is the simplest strategy and has a potential
for a large problem: that of an infinite loop on the same rule. If the working memory
remains the same, as does the rule-base, then the conditions of the first rule have not
changed and it will fire again and again. To solve this, it is a common practice to suspend a
fired rule and prevent it from re-firing until the data in working memory, that satisfied the
rule’s conditions, has changed.
2. Random: Though it doesn’t provide the predictability or control of the first-applicable
strategy, it does have its advantages. For one thing, its unpredictability is an advantage in
some circumstances (such as games for example). A random strategy simply chooses a
single random rule to fire from the conflict set. Another possibility for a random strategy is
a fuzzy rule-based system in which each of the rules has a probability such that some rules
are more likely to fire than others.
3. Most Specific: This strategy is based on the number of conditions of the rules. From the
conflict set, the rule with the most conditions is chosen. This is based on the assumption that
if it has the most conditions then it has the most relevance to the existing data.
4. Least Recently Used: Each of the rules is accompanied by a time or step stamp, which
marks the last time it was used. This maximizes the number of individual rules that are fired
at least once. If all rules are needed for the solution of a given problem, this is a perfect
strategy.
5. "Best" rule: For this to work, each rule is given a ‘weight,’ which specifies how much it
should be considered over the alternatives. The rule with the most preferable outcomes is
chosen based on this weight.
The Rule Base Contains rules, each rule a step in a problem solving process. Rules are persistent
knowledge about the domain. Typically only modified from the outside of the system, e.g. by an
expert on the domain.
The syntax is:
IF <conditions> THEN <actions>
Examples:
IF <TEMPERATURE, OVER, 20> THEN add (<OCEAN, SWIMABLE, YES>)
IF <FEVER, OVER, 39> AND <NECK, STIFF, YES> AND <HEAD, PAIN, YES> THEN add
(<PATIENT, DIAGNOSE, MENINGITIS>)
The conditions are matched to the working memory, and if they are fulfilled, the rule may be fired.
Actions can be:
103
Adding fact(s) to the working memory.
Removing fact(s) from the working memory
Modifying fact(s) in the working memory.
Systems can allow variables in the rules.
The syntax is a IF <conditions> AND <conditions> THEN <actions>
Example:
IF <$x, ISA, CAR> AND <$x, LIGHTS, DIM> THEN add(<CHECK, BATTERY, $x>)
4.4 Methods of Rule-Based Systems
Forward-Chaining
Rule-based systems, as defined above, are adaptable to a variety of problems. In some problems,
information is provided with the rules and the AI follows them to see where they lead. An example
of this is a medical diagnosis in which the problem is to diagnose the underlying disease based on a
set of symptoms (the working memory). A problem of this nature is solved using a forwardchaining,
data-driven, system that compares data in the working memory against the conditions (IF
parts) of the rules and determines which rules to fire.
104
Rule Found
Forward Chaining Procedure
Backward-Chaining
In other problems, a goal is specified and the AI must find a way to achieve that specified goal. For
example, if there is an epidemic of a certain disease, this AI could presume a given individual had
the disease and attempt to determine if its diagnosis is correct based on available information. A
backward-chaining, goal-driven, system accomplishes this. To do this, the system looks for the
action in the THEN clause of the rules that matches the specified goal. In other words, it looks for
the rules that can produce this goal. If a rule is found and fired, it takes each of that rule’s
conditions as goals and continues until either the available data satisfies all of the goals or there are
no more rules that match.
Exit if specified by Rule
No Rule
Found
Conflict Set
Rule
Base
Working
Memory
Conflict
Resolution
Strategy
Determine
Possible Rules to
Fire
Fire Rule
Exit
Select
Rule to
Fire
105
Yes
Recursively Back-Chain
Each Condition of Fired Rule as Goal
Yes
Backward Chaining Procedure
One or more Goals failed, check next matching rule
Goals found to be True, Exit, Returning True
Rule
Found
No Rule
Found
Conflict Set
Working
Memory
Conflict
Resolution
Strategy
All
recursion
returns
true
No
Rule
Base
Goal
Examine Working
Memory and Goals to see
if Goals are known true
in Knowledge Base
Return True
Determine next Possible
Rules to Fire by Checking
Conclusions and Goals
Do Goals
match
Memory?
Return False
For each Rule
Condition,
recursively backchain
with
condition as goal
Select
Rule to
Fire
Fire Rule
Exit
Exit
106
Which Method to use?
Of the two methods available, forward- or backward-chaining, the one to use is determined by the
problem itself. A comparison of conditions to actions in the rule base can help determine which
chaining method is preferred. If the ‘average’ rule has more conditions than conclusions, that is the
typical hypothesis or goal (the conclusions) can lead to many more questions (the conditions),
forward-chaining is favoured. If the opposite holds true and the average rule has more conclusions
than conditions such that each fact may fan out into a large number of new facts or actions,
backward-chaining is ideal.
If neither is dominant, the number of facts in the working memory may help the decision. If all
(relevant) facts are already known, and the purpose of the system is to find where that information
leads, forward-chaining should be selected. If, on the other hand, few or no facts are known and the
goal is to find if one of many possible conclusions is true, use backward-chaining.
Improving Efficiency of Forward Chaining
Forward-chaining systems, as powerful as they can be if well designed, can become cumbersome if
the problem is too large. As the rule-base and working memory grow, the brute-force method of
checking every rule condition against every assertion in the working memory can become quite
computationally expensive. Specifically, the computational complexity if the order of RA^C, where
R is the number of rules, C is the approximate number of conditions per rule, and A is the number
of assertions in working memory. With this exponential complexity, for a rule-base with any real
rules, the system will perform quite slowly.
There are ways to reduce this complexity, thus making a system of this nature far more feasible for
use with real problems. The most effective such solution to this is the Rete algorithm. The Rete
algorithm reduces the complexity by reducing the number of comparisons between rule conditions
and assertions in the working memory. To accomplish this, the algorithm stores a list of rules
matched or partially matched by the current working memory. Thus, it avoids unnecessary
computations in re-checking the already matched rules (they are already activated) or un-matched
rules (their conditions cannot be satisfied under the existing assertions in the working memory).
Only when the working memory changes do it re-check the rules, and then only against the
assertions added or removed from working memory. All told, this method drops the complexity to
O(RAC), linear rather than exponential.
The Rete algorithm, however, requires additional memory to store the state of the system from
cycle to cycle. The additional memory can be considerable, but may be justified for the increased
speed efficiency. For large problems in which speed is a factor, the Rete method is justified. For
small problems, or those in which speed is not an issue but memory is, the Rete method may not be
the best option. Another unfortunate shortcoming of the Rete method is that it only works with
forward-chaining
4.5 Forward vs Backwards Chaining:
107
Sl.No. Forward Chaining Backward Chaining
1.
2.
Start with the data and work towards
the goal?
Cough, Fever, Headache, Stiff Neck
Attempts to find rules where the
antecedent matches the Working
Memory and fire these.
Start with the goals and work towards the data?
Only works with fairly simple consequent
structures. Gives goal-oriented reasoning.
Influenza, Meningitis
Attempts to find rules where the consequent
matches the goal in the Working Memory,
pushing the antecedent of the rule as new subgoal
on the Working Memory.
Backwards Chaining:
Starts with a goal stack and a (possibly empty) Working Memory. The goal stack has attributes we
would like to know the value of.
The algorithm:
1. Find the set of rules answering the top level goal.
2. If a premise of a rule is not satisfied, look for rules that derive the specified value. If any can
be found, then consider this to be a sub-goal, place it on top of the goal stack.
3. If step 2 can not find a rule to satisfy an attribute, ask the user for this value, put these in
working memory.
4. Remove satisfied sub-goals. If goal stack is empty, return with success, if not iterate another
time.
Example: Fruit
R1: IF Shape = long and Colour= green or yellow THEN Fruit = banana
R2: IF Shape = round or oblong and Diameter > 4 inches THEN Fruitclass=tree
R3: IF Shape = round or oblong and Diameter < 4 inches THEN Fruitclass=vine
R4: IF Seedcount= 1 THEN Seedclass=stonefruit
R5: IF Seedcount>1 THEN Seedclass=multiple
R6: IF FruitClass= vine and Colour= green THEN Fruit= watermelon
R7: IF FruitClass= vine and Surface= rough and Colour= tan THEN Fruit= honeydew
R8: IF FruitClass= vine and Surface= smooth and Colour= yellow THEN Fruit= cantaloupe
R9: IF FruitClass= tree and Colour= orange and Seedclass= stonefruit THEN Fruit= apricot
R10: IF FruitClass= tree and Colour= orange and Seeedclass= multiple THEN Fruit =orange
R11: IF FruitClass= tree and Colour= red or yellow or green and Seedclass= multiple THEN Fruit=
apple
Working memory: {Diameter = 5 inch, Shape = round, SeedCount > 1, Colour = yellow, Surface =
smooth}
Forward Chaining
Starts with the data in the Working Memory, finds rules that match until the goal is reached.
Step Applicable Rules Chosen Rule Derived Facts
108
1 R2, R5 R2 Fruitclass = tree
2 R2, R5 R5 Seedclass = multiple
3 R2, R5, R11 R11 Fruit = apple
4 R2, R5, R11 - -
Requires a Conflict Resolution strategy – many rules may fire
4.6 Rule-Based Deduction Systems
The way in which a piece of knowledge is expressed by a human expert carries important
information.
For example, if a person has fever and feels tummy-pain then he may has an infection. In logic it
can be expressed as follows:
_x. (has_fever(x) & tummy_pain(x) _ has_an_infection(x))
If we convert this formula to clausal form we loose the content as then we may have equivalent
formulas like:
(i) has_fever(x) & ~has_an_infection(x) _ ~tummy_pain(x)
(ii) ~has_an_infection(x) & tummy_pain(x) _ ~has_fever(x)
We notice that (i) and (ii) despite been logically equivalent to the original sentence have lost the
main information contained in its formulation.
Forward Production Systems
The main idea behind the forward/ backward production systems is to take advantage of the
implicational form in which production rules are stated by the expert and use that information to
help achieving the goal.
In the present systems the formulas will have two forms:
1. rules and
2. facts,
Rules are the productions stated in implication form. They express specific knowledge about the
problem, and facts are assertions not expressed as implications. The task of the system will be to
prove a goal formula with these facts and rules. In a forward production system the rules are
expressed as F-rules which operate on the global database of facts until the termination condition is
achieved. This sort of proving system is a direct system rather of a refutation system.
Facts
Facts are expressed in AND/OR form. An expression in AND/OR form consists on sub-expressions
of literals connected by ‘&’ and ‘V’ symbols representing ‘AND’ and ‘OR’ respectively.
An expression in AND/OR form is not in clausal form.
Steps to transform facts into AND/OR form:
1. Eliminate (temporarily) implication symbols.
2. Reverse quantification of variables in first disjunct by moving negation symbol.
3. Skolemize existential variables.
4. Move all universal quantifiers to the front a drop.
109
5. Rename variables so the same variable does not occur in different (main) conjuncts.
e.g.
Original formula: _u. _v. {q(v, u) & ~[[r(v) v p(v)] & s(u,v)]}
Converted formula: q(w, a) & {[~r(v) & ~p(v)] v ~s(a,v)}
All variables appearing on the final expressions are assumed to be universally quantified.
F-Rules
Rules in a forward production system will be applied to the AND/OR graph to produce new
transformed graph structures. We assume that rules in a forward production system are of the form:
LW
Where, L is a literal and W is a formula in AND/OR form.
Let us recall that the rule of the form
(L1 V L2) W
is equivalent to the pair of rules:
L1 W V L2 W
Steps to transform the rules into a free-quantifier form:
1. Eliminate (temporarily) implication symbols.
2. Reverse quantification of variables in first disjunct by moving negation symbol.
3. Skolemize existential variables.
4. Move all universal quantifiers to the front and drop.
5. Restore implication.
All variables appearing on the final expressions are assumed to be universally quantified.
e.g.
Original formula: _x.(_y. _z. (p(x, y, z)) _ _u. q(x, u))
Converted formula: p(x, y, f(x, y)) _ q(x, u).
For example, we consider following facts:
Fido barks and bites, or Fido is not a dog.
All terriers are dogs.
Anyone who barks is noisy.
Based on these facts, prove that: “there exists someone who is not a terrier or who is noisy.”
Logic representation:
(barks(fido) & bites(fido)) _ ~dog(fido)
R1: terrier(x) _ dog(x)
R2: barks(y) _ noisy(y)
goal: _w.(~terrier(w) _noisy(w))
110
AND/OR Graph for the ‘terrier’ problem:
Backward production systems
An important property of logic is the duality between assertions and goals in theorem-proving
systems. Duality between assertions and goals allows the goal expression to be treated as if it were
an assertion.
Steps to convert the goal expression into AND/OR form:
1. Eliminate implication symbols.
2. Move negation symbols in.
3. Skolemize universal variables.
4. Drop existential quantifiers. Variables remaining in the AND/OR form are considered to be
existentially quantified.
Goal clauses are conjunctions of literals and the disjunction of these clauses is the clause form of
the goal well-formed formula.
B-Rules
We restrict B-rules to expressions of the form:
WL
Where W is an expression in AND/OR form and L is a literal
[barks(fido) & bites(fido)] v ~dog(fido)
barks(fido) & bites(fido) ~dog(fido)
barks(fido) bites(fido)
noisy(fido)
~terrier(fido)
noisy(z) ~terrier(z)
goal nodes
R
R2
{fido/z}
{fido/z}
111
The scope of quantification of any variables in the implication is the entire implication. As we
know that
W_(L1 & L2) is equivalent to the two rules: W_L1 and W_L2.
An important property of logic is the duality between assertions and goals in theorem-proving
systems. Duality between assertions and goals allows the goal expression to be treated as if it were
an assertion.
Steps for conversion of the goal expression into AND/OR form:
1. Eliminate implication symbols..
2. Move negation symbols in.
3. Skolemize existential variables.
4. Drop existential quantifiers. Variables remaining in the AND/OR form are considered to be
existentially quantified.
Goal clauses are conjunctions of literals and the disjunction of these clauses is the clause form of
the goal well-formed formula.
Advantages of RBS:
1. Naturalness of Expression:
Expert knowledge can often been seen naturally as rules of thumb.
2. Modularity:
Rules are independent of each other – new rules can be added or revised later. Each rule can
be seen like a "unit of knowledge." Interpreter is independent from rules.
3. Restricted Syntax allows construction of rules and consistency checking by other programs.
Allows (fairly easy) rephrasing to natural language.
4. Problem-Solving Process
5. Frequent Rule Selection allows reasoner to continuously revise strategy, jumping from
hypothesis to hypothesis. Multiple solutions is effectively pursued at one time.
For example; MYCIN argued that rules can serve as explanations.
HOW-explanation: Displays the rules used to derive the conclusion.
WHY-explanation: Justifies a question by referring to the rule we are trying to fire.
5. Uniformity:
All the knowledge is expressed in the same format, that facilitates the development of Base
Disadvantages of RBS
1. Infinite chaining
Rule-based systems need to be specially crafted to avoid infinite loops:
• Forward can enter an infinite loop with rules like:
IF all-ones(Xs)
THEN all-ones([1|Xs])
• Backward can enter an infinite look with rules like:
IF all-ones([1|Xs])
THEN all-ones(Xs)
112
2. Possibility of contradictions
The modification of Knowledge Base can be complicated:
• Sometimes, when introducing new knowledge to solve some specific problem (for example
adding a new rule), we might introduce contradictions with the previous rules.
• Equally, if we want to change some existing rule (because it does not work well in some
cases, for example), we must consider how it affects the change to all the rules that depend
on her.
RULE 1: IF it is raining
THEN not (weather is sunny)
RULE 2: IF location is Florida
THEN not (weather is cloudy)
RULE 3: IF time of day is late afternoon
THEN weather is sunny or weather is cloudy
FACTS: {Time of day is late afternoon; Location is Florida}
Inference: Facts with R2 and R3 imply that it is sunny.
But if we add a new rule:
RULE 4: IF time of day is late afternoon and location is Florida
THEN it is raining
3. Inefficiency
Because rules require pattern matching, computational cost of rule-based systems can be
very high.
4. Opacity
It is very difficult to examine a rule-based system to determine what actions are going to
happen, and when.
5. Complex Domains
Some domains are so complex that tens of thousands of rules would be needed to represent
the large number of possible situations, e.g. Flight Control
6. Incremental construction of knowledge base – is it really possible?
Rules must often be written with conflict resolution strategy in mind
The consequence of removing or adding a rule on reasoning competence not clear.
Control knowledge in antecedents of rules adds interdependence between rules.
Consistency can not be guaranteed:
RULE 1: IF it is raining THEN not (weather is sunny)
RULE 2: IF location is Florida THEN not (weather is cloudy)
RULE 3: IF time of day is late afternoon THEN weather is sunny or weather is cloudy
FACTS: {Time of day is late afternoon; Location is Florida}
Inference: R2 and R3 imply that it is sunny.
Next, add a new rule:
RULE 4: IF time of day is late afternoon and location is Florida THEN it is raining
What is the conclusion now?
113
Check Your Progress:
1. What is a rule-based system? Why is it called as production system?
2. Explain in detail the forward chaining process of inference in production system.
3. Explain the term "knowledge" with respect to knowledge - based systems. Distinguish
between procedural and declarative knowledge.
4. Discuss briefly the advantages and disadvantages of Rule Based Systems.
5. Compare forward and backward chaining procedures.
114
Chapter 5 Structural Knowledge Representation
5.1 Semantic Networks
A semantic network or a semantic net is a graphic notation for representing knowledge in patterns
of interconnected nodes and arcs. Computer implementations of semantic networks were first
developed for artificial intelligence and machine translation, but earlier versions have long been
used in philosophy, psychology, and linguistics.
A semantic network is a network which represents semantic relations between the concepts. This
is often used as a form of knowledge representation. It is a directed or undirected graph consisting
of vertices, which represent concepts, and edges.
For example,
A Semantic Network
What is common to all semantic networks is a declarative graphic representation that can be used
either to represent knowledge or to support automated systems for reasoning about knowledge.
Some versions are highly informal, but other versions are formally defined systems of logic.
Following are six of the most common kinds of semantic networks:
1. Definitional networks
2. Assertional networks
3. Implicational networks
4. Executable networks
5. Learning networks
6. Hybrid networks
Lives in
Lives in
Is an
Is an
Is a
Is a
has
has
has Is a
Vertebra Cat Fur
Animal Mammal Bear
Fish Water
Whale
115
1. Definitional Networks
Definitional networks emphasize the subtype or “is-a” relation between a concept type and a newly
defined subtype. The resulting network, also called a generalization or subsumption hierarchy,
supports the rule of inheritance for copying properties defined for a super type to all of its subtypes.
Since definitions are true by definition, the information in these networks is often assumed to be
necessarily true.
The oldest known semantic network was drawn in the 3rd century AD by the Greek philosopher
Porphyry in his commentary on Aristotle's categories. Porphyry used it to illustrate Aristotle's
method of defining categories by specifying the genus or general type and the differentiae that
distinguish different subtypes of the same super type. Following figure 1 shows a version of the
Tree of Porphyry, as it was drawn by the logician Peter of Spain (1329). It illustrates the categories
under Substance, which is called the supreme genus or the most general category.
Figure 1. Tree of Porphyry, as drawn by Peter of Spain (1329)
Despite its age, the Tree of Porphyry represents the common core of all modern hierarchies that are
used for defining concept types. In Figure 1, the genus Substance with the differentia material is
Body and with the differentia immaterial is Spirit. The modern rule of inheritance is a special case
of the Aristotelian syllogisms, which specify the conditions for inheriting properties from
supertypes to subtypes: Living Thing inherits material Substance from Body and adds the
differentia animate; Human inherits sensitive animate material Substance and adds the differentia
rational. Aristotle, Porphyry, and the medieval logicians also distinguished the categories or
universals from the individual instances or particulars, which are listed at the bottom of Figure 1.
116
Aristotle's methods of definition and reasoning are still used in artificial intelligence, objectoriented
programming languages, and every dictionary from the earliest days to the present.
The first implementations of semantic networks were used to define concept types and patterns of
relations for machine translation systems. Silvio Ceccato (1961) developed correlational nets,
which were based on 56 different relations, including subtype, instance, part-whole, case relations,
kinship relations, and various kinds of attributes. He used the correlations as patterns for guiding a
parser and resolving syntactic ambiguities. Margaret Masterman's system at Cambridge University
(1961) was the first to be called a semantic network. She developed a list of 100 primitive concept
types, such as Folk, Stuff, Thing, Do, and Be. In terms of those primitives, her group defined a
conceptual dictionary of 15,000 entries. She organized the concept types into a lattice, which
permits inheritance from multiple supertypes. The basic principles and even many of the primitive
concepts have survived in more recent systems of preference semantics.
Among current systems, the description logics include the features of the Tree of Porphyry as a
minimum, but they may also add various extensions. They are derived from an approach proposed
by Woods (1975) and implemented by Brachman (1979) in a system called Knowledge Language
One (KL-ONE).
The tree of Porphyry, KL-ONE, and many versions of description logics are subsets of classical
first-order logic (FOL). They belong to the class of monotonic logics, in which new information
monotonically increases the number of provable theorems, and none of the old information can ever
be deleted or modified. Some versions of description logics support non-monotonic reasoning,
which allows default rules to add optional information and canceling rules to block inherited
information. Such systems can be useful for many applications, but they can also create problems
of conflicting defaults, as illustrated in Figure 2.
Figure 2. Conflicting defaults in a definitional network
The +ixon diamond on the left shows a conflict caused by inheritance from two different
supertypes: by default, Quakers are pacifists, and Republicans are not pacifists. Does Nixon inherit
pacifism along the Quaker path, or is it blocked by the negation on the Republican path? On the
right is another diamond in which the subtype RoyalElephant cancels the property of being gray,
which is the default color for ordinary elephants. If Clyde is first mentioned as an elephant, his
default color would be gray, but later information that he is a RoyalElephant should cause the
previous information to be retracted. To resolve such conflicts, many developers have rejected local
defaults in favor of more systematic methods of belief revision that can guarantee global
consistency.
117
2. Assertional Networks
Assertional networks are designed to assert propositions. Unlike definitional networks, the
information in an assertional network is assumed to be contingently true, unless it is explicitly
marked with a modal operator. Some assertional networks have been proposed as models of the
conceptual structures underlying natural language semantics.
Gottlob Frege (1879) developed a tree notation for the first complete version of first-order logic.
Charles Sanders Peirce (1880, 1885) independently developed an algebraic notation, which with a
change of symbols by Peano (1889) has become the modern notation for predicate calculus.
Although Peirce invented the algebraic notation, he was never fully satisfied with it. As early as
1882, he was searching for a graphic notation, similar to the notations used in organic chemistry
that would more clearly show "the atoms and molecules of logic." Figure 3 shows one of his
relational graphs, which represents the sentence, A Stagirite teacher of a Macedonian conqueror of
the world is a disciple and an opponent of a philosopher admired by Church Fathers.
Figure 3. A Relational Graph
Figure 3 contains three branching lines of identity, each of which corresponds to an existentially
quantified variable in the algebraic notation. The words and phrases attached to those lines
correspond to the relations or predicates in the algebraic notation. With that correspondence, Figure
4 can be translated to the following formula in predicate calculus:
( x)( y)( z)(isStagirite(x) teaches(x,y) isMacedonian(y) conquersTheWorld(y)
isDiscipleOf(y,z) isOpponentOf(y,z) isAdmiredByChurchFathers(z) ).
As this formula illustrates, a relational graph only represents two logical operators: the conjunction
and the existential quantifier . Other operators, such as negation ~, disjunction , implication
, and the universal quantifier , are more difficult to express because they require some method
for demarcating the scope; that part of the formula that is governed by the operator.
In 1897, Peirce made a simple, but brilliant discovery that solved all the problems at once. He
introduced an oval that could enclose and negate an arbitrarily large graph or subgraph. Then
combinations of ovals with conjunction and the existential quantifier could express all the logical
operators used in the algebraic notation (Peirce 1909). That innovation transformed the relational
graphs into the system of existential graphs (EG), which Peirce called "the logic of the future"
(Roberts 1973). The implication , for example, could be represented with a nest of two ovals,
since (p q) is equivalent to ~(p ~q). At the left of Figure 4 is an existential graph for the
sentence, “If a farmer owns a donkey, then he beats it”.
Is a Stagirite
teaches
Is a Macedonian
Is a disciple of
Is an opponent of
Conquers the world
Is a philosopher admired by Church Fathers
118
Figure 4. EG and DRS for "If a farmer owns a donkey, then he beats it."
The outer oval of Figure 4 is the antecedent or if part, which contains farmer, linked by a line
representing ( x) to owns, which is linked by a line representing ( y) to donkey. The sub graph in
the outer oval may be read If a farmer x owns a donkey y. The lines x and y are extended into the
inner oval, which represents the consequent, then x beats y. Figure 5 may be translated to the
following algebraic formula:
~( x)( y)(farmer(x) donkey(y) owns(x,y) ~beats(x,y)).
This formula is equivalent to
( x)( y)((farmer(x) donkey(y) owns(x,y)) beats(x,y)).
For comparison, the diagram on the right of Figure 4 is a discourse representation structure (DRS),
which Hans Kamp (1981; Kamp and Reyle 1993) invented to represent natural language semantics.
Instead of nested ovals, Kamp used boxes linked by arrows; and instead of lines of identity, Kamp
used variables. But the logical structures are formally equivalent, and the same techniques for
representing logic in natural language can be adapted to either notation.
Figure 5 shows a conceptual dependency graph for the sentence “A dog is greedily eating a bone”.
Schank used different kinds of arrows for different relations, such as for the agent-verb relation
or an arrow marked with o for object. He replaced the verb eat with one of his primitive acts ingest;
he replaced adverbs like greedily with adjective forms like greedy; and he added the linked arrows
marked with d for direction to show that the bone goes from some unspecified place into the dog
(the subscript 1 indicates that the bone went into the same that who ingested it).
Figure 5. Schank's notation for conceptual dependencies
Conceptual dependencies were primarily suited to representing information at the sentence level,
but Schank and his colleagues later developed notations for representing larger structures, in which
the sentence-level dependencies occurred as nested substructures. The larger structures were called
scripts, memory organization packets (MOPs), and thematic organization packets (TOPs). To learn
119
or discover the larger structures automatically, case-based reasoning has been used to search for
commonly occurring patterns among the lower-level conceptual dependencies.
Conceptual graphs are a variety of propositional semantic networks in which the relations are
nested inside the propositional nodes. They evolved as a combination of the linguistic features of
Tesnière's dependency graphs and the logical features of Peirce's existential graphs with strong
influences from the work in artificial intelligence and computational linguistics. Figure 6 shows a
comparison of Peirce's EG from Figure 6 with a conceptual graph (CG) that represents the sentence
If a farmer owns a donkey, then he beats it.
Figure 6. Comparison of the EG with a CG for the same sentence
The most obvious differences between the EG and the CG are cosmetic:
the ovals are squared off to form boxes, and
the implicit negations in the EG are explicitly marked If and Then for better readability.
The more subtle differences are in the range of quantification and the point where the quantification
occurs. In an EG, a line of identity represents an existential quantifier ( x) or ( y), which ranges
over anything in the domain of discourse; but in a CG, each box, called a concept, represents a
quantifier ( x: Farmer) or ( y: Donkey), which is restricted to the type or sort Farmer or Donkey.
In the CG, the arcs with arrows indicate the argument of the relations (numbers are used to
distinguish the arcs for relations with more than two arguments). Nodes such as [T] represent the
pronouns he or it, which are linked to their antecedents by dotted lines called co-reference links.
3. Implicational Networks
Implicational networks use implication as the primary relation for connecting nodes. They may be
used to represent patterns of beliefs, causality, or inferences.
An implicational network is a special case of a propositional semantic network in which the
primary relation is implication. Other relations may be nested inside the propositional nodes, but
they are ignored by the inference procedures. Depending on the interpretation, such networks may
be called belief networks, causal networks, Bayesian networks, or truth-maintenance systems.
Figure 7 shows possible causes for slippery grass: each box represents a proposition, and the arrows
show the implications from one proposition to another. If it is the rainy season, the arrow marked T
implies that it recently rained; if not, the arrow marked F implies that the sprinkler is in use. For
boxes with only one outgoing arrow, the truth of the first proposition implies the truth of the
second, but falsity of the first makes no prediction about the second.
120
Figure 7. An implicational network for reasoning about wet grass
Suppose someone walking across a lawn slips on the grass. Figure 11 represents the kind of
background knowledge that the victim might use to reason about the cause. A likely cause of
slippery grass is that the grass is wet. It could be wet because either the sprinkler had been in use or
it had recently rained. If it is the rainy season, the sprinkler would not be in use. Therefore, it must
have rained.
4. Executable Networks
Executable networks include some mechanism, such as marker passing or attached procedures,
which can perform inferences, pass messages, or search for patterns and associations.
Executable semantic networks contain mechanisms that can cause some change to the network
itself. The executable mechanisms distinguish them from networks that are static data structures,
which can only change through the action of programs external to the net itself. Three kinds of
mechanisms are commonly used with executable semantic networks:
1. Message passing networks can pass data from one node to another. For some networks, the
data may consist of a single bit, called a marker, token, or trigger; for others, it may be a
numeric weight or an arbitrarily large message.
2. Attached procedures are programs contained in or associated with a node that perform some
kind of action or computation on data at that node or some nearby node.
3. Graph transformations combine graphs, modify them, or break them into smaller graphs. In
typical theorem provers, such transformations are carried out by a program external to the
graphs. When they are triggered by the graphs themselves, they behave like chemical
reactions that combine molecules or break them apart.
These three mechanisms can be combined in various ways. Messages passed from node to node
may be processed by procedures attached to those nodes, and graph transformations may also be
triggered by messages that appear at some of the nodes.
The simplest networks are dataflow graphs, which contain passive nodes that hold data and active
nodes that take data from input nodes and send results to output nodes.
121
Figure 8 shows a dataflow graph with boxes for the passive nodes and diamonds for the active
nodes. The labels on the boxes indicate the data type (Number or String), and the labels on the
diamonds indicate the name of the function (+, ×, or convert string to number).
Figure 8. A dataflow graph
For numeric computations, dataflow graphs have little advantage over the algebraic notation used
in common programming languages. Figure 8, for example, would correspond to an assignment
statement of the following form:
X = (A + B) * S2N(C)
Graphic notations are more often used in an Integrated Development Environment (IDE) for linking
multiple programs to form a complete system. When dataflow graphs are supplemented with a
graphic method for specifying conditions, such as if-then-else, and a way of defining recursive
functions, they can form a complete programming language, similar to functional programming
languages such as Scheme and ML.
5. Learning Networks
Learning networks build or extend their representations by acquiring knowledge from examples.
The new knowledge may change the old network by adding and deleting nodes and arcs or by
modifying numerical values, called weights, associated with the nodes and arcs.
A learning system, natural or artificial, responds to new information by modifying its internal
representations in a way that enables the system to respond more effectively to its environment.
Systems that use network representations can modify the networks in three ways:
1. Rote memory. The simplest form of learning is to convert the new information to a network
and add it without any further changes to the current network.
2. Changing weights. Some networks have numbers, called weights, associated with the nodes
and arcs. In an implicational network, for example, those weights might represent
probabilities, and each occurrence of the same type of network would increase the estimated
probability of its recurrence.
3. Restructuring. The most complex form of learning makes fundamental changes to the
structure of the network itself. Since the number and kinds of structural changes are
unlimited, the study and classification of restructuring methods is the most difficult, but
potentially the most rewarding if good methods can be found.
122
+eural nets are a widely-used technique for learning by changing the weights assigned to the nodes
or arcs of a network. Their name, however, is a misnomer, since they bear little resemblance to
actual neural mechanisms. Figure 9 shows a typical neural net, whose input is a sequence of
numbers that indicate the relative proportion of some selected features and whose output is another
sequence of numbers that indicate the most likely concept characterized by that combination of
features. In an application such as optical character recognition, the features might represent lines,
curves, and angles, and the concepts might represent the letters that have those features.
Figure 9. A 8eural 8et
In a typical neural network, the structure of nodes and arcs is fixed, and the only changes that may
occur are the assignments of weights to the arcs. When a new input is presented, the weights on the
arcs are combined with the weights on the input features to determine the weights in the hidden
layers of the net and ultimately the weights on the outputs. In the learning stage, the system is told
whether the predicted weights are correct, and various methods of back propagation are used to
adjust the weights on the arcs that lead to the result.
Rote memory is best suited to applications that require exact retrieval of the original data, and
methods of changing weights are best suited to pattern recognition.
6. Hybrid Networks
Hybrid networks combine two or more of the previous techniques, either in a single network or in
separate, but closely interacting networks.
Many computer systems are hybrids, such as a combination of a database system for storing data, a
graphics package for controlling the user interface, and a programming language for detailed
computation.
The most widely used hybrid of multiple network notations is the Unified Modeling Language
(UML), which was designed by three authors, Grady Booch, Ivar Jacobson, and Jim Rumbaugh,
who merged their competing notations (Rational Software 1997).
5.2 Partitioned Networks
123
Partitioned Semantic Networks allow for:
Propositions to be made without commitment to truth.
Expressions to be quantified.
In order to create a partitioned network, we break network into spaces which consist of groups of
nodes and arcs and regard each space as a node.
Let us consider the following example:
Andrew believes that the earth is flat.
We can encode the proposition the earth is flat in a space and within it have nodes and arcs the
represent the fact.
We can the have nodes and arcs to link this space the rest of the network to represent Andrew's
belief.
Partitioned network
Now consider the quantified expression:
Every parent loves their child
To represent this we:
Create a general statement, GS, special class.
Make node g an instance of GS.
Every element will have at least 2 attributes:
o a form that states which relation is being asserted
o one or more forall (_) or exists (_) connections - these represent universally
quantifiable variables in such statements e.g. x, y in
124
_x: parent(x) __y: child(y) loves(x, y)
Here we have to construct two spaces one for each.
So we could construct a partitioned network as in following figure
Partitioned network
5.3 Inheritance
Inheritance is one of the main kind of reasoning done in semantic nets The subset relation is often
used to link a class and its superclass. Some links (e.g. legs) are inherited along subset paths The
semantics of a semantic net can be relatively informal or very formal Often defined at the
implementation level.
125
Inheritance provides a means of dealing with default reasoning. E.g. we could represent:
Emus are birds.
Typically birds fly and have wings.
Emus run.
In the following Semantic net:
In making certain inferences we will also need to distinguish between the link that defines a new
entity and holds its value and the other kind of link that relates two existing entities.
We consider the example where the height of two people is depicted and we also wish to compare
them.
We need extra nodes for the concept as well as its value.
Member
Subset
Mammals Legs
Bill
4
Cats
has_part
action
instance
action
fly
wings
emu run
bird
126
Special procedures are needed to process these nodes, but without this distinction the analysis
would be very limited.
5.4 Multiple Inheritance
A node can have any number of superclasses that contain it, enabling a node to inherit properties
from multiple parent nodes and their ancestors in the network. It can cause conflicting inheritance.
5.5 Problems with Semantic Nets
greater-than
value
height height
Dave
H2
Steve
H1
160
pacifist
subclass
instance instance
non-pacifist
subclass
Quaker
Person
Nixon
Republican
height height
Dave
158
Steve
160
127
Binary relations are easy to represent. Others are harder.
For example: “Opus brings tequila to the party.”
Other problematic statements:
negation “Opus does not ride a bicycle”;
disjunction “Opus eats oranges or apples”;
Quantified statements are very hard for semantic nets.
Examples:
“every dog has bitten a postman”
“every dog has bitten every postman”
Partitioned semantic nets can represent these.
Advantages of Semantic Nets
Easy to visualise Relationships can be arbitrarily defined by the knowledge engineer Formal
definitions of semantic networks have been developed. Related knowledge is easily clustered.
Efficient in space requirements Objects represented only once
Disadvantages of Semantic Nets
Inheritance (particularly from multiple sources and when exceptions in inheritance are wanted) can
cause problems. Facts placed inappropriately cause problems. No standards about node and arc
values
5.6 Knowledge Representation Formalisms
Some of the abstract knowledge representation mechanisms are the following:
128
Simple relational knowledge
The simplest way of storing facts is to use a relational method where each fact about a set of objects is
set out systematically in columns. This representation gives little opportunity for inference, but it can be
used as the knowledge basis for inference engines.
Simple way to store facts.
Each fact about a set of objects is set out systematically in columns.
Little opportunity for inference.
Knowledge basis for inference engines.
Musician Style Instrument Age
Miles Davis Jazz Trumpet Deceased
John Zorn Avant Grade Saxophone 35
Frank Zappa Rock Guitar Deceased
John Melaughlin Jazz Guitar 47
We can ask things like:
Who is dead?
Who plays Jazz/Trumpet etc.?
This sort of representation is popular in database systems.
Inheritable knowledge
Relational knowledge is made up of objects consisting of
• attributes
• corresponding associated values.
We extend the base more by allowing inference mechanisms:
• Property inheritance
o elements inherit values from being members of a class.
o data must be organised into a hierarchy of classes.
129
Boxed nodes: objects and values of attributes of objects.
Values can be objects with attributes and so on.
Arrows: point from object to its value.
This structure is known as a slot and filler structure, semantic network or a collection of
frames.
The algorithm to retrieve a value for an attribute of an instance object:
1. Find the object in the knowledge base
2. If there is a value for the attribute report it
3. Otherwise look for a value of instance if none fail
4. Otherwise go to that node and find a value for the attribute and then report it
5. Otherwise search through using “isa” until a value is found for the attribute.
Inferential Knowledge
Inferential knowledge represents knowledge as formal logic:
All dogs have tails : dog(x) hasatail(x) Advantages:
A set of strict rules.
o Can be used to derive more facts.
o Truths of new statements can be verified.
o Guaranteed correctness.
Many inference procedures available to in implement standard rules of logic.
Popular in AI systems. e.g. Automated theorem proving.
is a
bands bands
instance
instance
is a
is a
Adult Male age 35
Musician
Jazz Avant Grade/
Jazz
Miles Davis John Zom
Miles Davis Group
Miles Davis Quintet
Naked City
Massada
130
Procedural knowledge
Basic idea of procedural knowledge is:
Knowledge encoded in some procedures:
Small programs that know how to do specific things, how to proceed.
e.g a parser in a natural language understander has the knowledge that a noun phrase may
contain articles, adjectives and nouns. It is represented by calls to routines that know how to
process articles, adjectives and nouns.
Advantages:
1. Heuristic or domain specific knowledge can be represented.
2. Extended logical inferences, such as default reasoning facilitated.
3. Side effects of actions may be modeled. Some rules may become false in time. Keeping track of
this in large systems may be tricky.
Disadvantages:
1. Completeness -- not all cases may be represented.
2. Consistency -- not all deductions may be correct.
e.g If we know that Fred is a bird we might deduce that Fred can fly. Later we might discover
that Fred is an emu.
3. Modularity is sacrificed. Changes in knowledge base might have far-reaching effects.
4. Cumbersome control information.
The following properties should be possessed by a knowledge representation system.
Representational Adequacy
The ability to represent the required knowledge;
Inferential Adequacy
The ability to manipulate the knowledge represented to produce new knowledge corresponding
to that inferred from the original;
Inferential Efficiency
The ability to direct the inferential mechanisms into the most productive directions by storing
appropriate guides;
Acquisitional Efficiency
The ability to acquire new knowledge using automatic methods wherever possible rather than
reliance on human intervention.
Till the date no single system optimizes all of the above.
5.7 Well-formed formula
A well-formed formula or simply formula (often abbreviated wff, pronounced "wiff" or "wuff") is
an idea or concept which is expressed using the symbols and formation rules (also called the formal
grammar) of a particular formal language.
Syntax Rules
131
Not all strings can represent propositions of the predicate logic. Those which produce a proposition
when their symbols are interpreted must follow the rules given below, and they are called
wffs(well-formed formulas) of the first order predicate logic.
Rules for constructing Wffs
A predicate name followed by a list of variables such as P(x, y), where P is a predicate name, and x
and y are variables, is called an atomic formula.
Wffs are constructed using the following rules:
1. True and False are wffs.
2. Each propositional constant (i.e. specific proposition), and each propositional variable (i.e. a
variable representing propositions) are wffs.
3. Each atomic formula (i.e. a specific predicate with variables) is a wff.
4. If A, B, and C are wffs, then so are A, (A B), (A B), (A B), and (A B).
5. If x is a variable (representing objects of the universe of discourse), and A is a wff, then so
are x A and x A .
For example, "The capital of Virginia is Richmond." is a specific proposition. Hence it is a wff by
Rule 2.
Let B be a predicate name representing "being blue" and let x be a variable. Then B(x) is an atomic
formula meaning "x is blue". Thus it is a wff by Rule 3 above. By applying Rule 5 to B(x), xB(x)
is a wff and so is xB(x). Then by applying Rule 4 to them x B(x) x B(x) is seen to be a wff.
Similarly if R is a predicate name representing "being round". Then R(x) is an atomic formula.
Hence it is a wff. By applying Rule 4 to B(x) and R(x), a wff B(x) R(x) is obtained.
In this manner, larger and more complex wffs can be constructed following the rules given above.
Note, however, that strings that can not be constructed by using those rules are not wffs. For
example, xB(x)R(x), and B( x ) are 8OT wffs, 8OR are B( R(x) ), and B( x R(x) ) .
One way to check whether or not an expression is a wff is to try to state it in English. If you
can translate it into a correct English sentence, then it is a wff.
More examples: To express the fact that Tom is taller than John, we can use the atomic formula
taller(Tom, John), which is a wff. This wff can also be part of some compound statement such as
taller(Tom, John) taller(John, Tom), which is also a wff.
If x is a variable representing people in the world, then taller(x,Tom), x taller(x,Tom), x
taller(x,Tom), x y taller(x,y) are all wffs among others.
However, taller( x,John) and taller(Tom Mary, Jim), for example, are 8OT wffs.
5.8 Predicate Logic
First-order predicate calculus or first-order logic (FOL) is a theory in symbolic logic that states
quantified statements such as “there exists an object such that...” or “for all objects, it is the case
that...”.
First-order logic is distinguished from higher-order logic in that it does not allow statements such as
“for every property, it is the case that...” or “there exists a set of objects such that...”.
132
The definition of a formula in first-order logic comes in several parts. First, the set of terms is
defined recursively. Terms, informally, are expressions that represent objects from the domain of
discourse.
1. Any variable is a term.
2. Any constant symbol from the signature is a term
3. an expression of the form f(t1,...,tn), where f is an n-ary function symbol, and t1,...,tn are
terms, is again a term.
The next step is to define the atomic formulas.
1. If t1 and t2 are terms then t1=t2 is an atomic formula
2. If R is an n-ary relation symbol, and t1,...,tn are terms, then R(t1,...,tn) is an atomic formula
Finally, the set of WFFs is defined to be the smallest set containing the set of atomic WFFs such
that the following holds:
1. ¬_ is a WFF when _ is a WFF
2. (_ k) and (_ _ k) are WFFs when _ and k are WFFs;
3. _x _ is a WFF when x is a variable and _ is a WFF;
4. _x _ is a WFF when x is a variable and _ is a WFF (alternatively, _x _ could be
defined as an abbreviation for ¬_x¬_).
If a formula has no occurrences of _x or _x, for any variable x, then it is called quantifier-free. An
existential formula is a string of existential quantification followed by a quantifier-free formula.
5.9 Frame
A frame is an artificial intelligence data structure used to divide knowledge into substructures by
representing “stereotyped situations.” Frames are connected together to form a complete idea.
Frame structure
The frame contains information on how to use the frame, what to expect next, and what to do when
these expectations are not met. Some information in the frame is generally unchanged while other
information, stored in “terminals,” usually change. Different frames may share the same terminals.
A frame’s terminals are already filled with default values, which is based on how the human mind
works. For example, when a person is told “a boy kicks a ball,” most people will be able to
visualize a particular ball (such as a familiar soccer ball) rather than imagining some abstract ball
with no attributes.
5.10 Class
In object-oriented programming, a class is a programming language construct that is used as a
blueprint to create objects of that class. This blueprint describes the state and behavior that the
objects of the class all share. An object of a given class is called an instance of the class. The class
that contains that instance can be considered as the type of that object, e.g. a type of an object of the
"Fruit" class would be "Fruit".
133
A class usually represent a noun, such as a person, place or thing - it is a model of a concept within
a computer program. Fundamentally, it encapsulates the state and behavior of the concept it
represents. It encapsulates state through data placeholders called attributes (or member variables or
instance variables); it encapsulates behavior through reusable sections of code called methods.
More technically, a class is a cohesive package that consists of a particular kind of metadata. A
class has both an interface and a structure. The interface describes how to interact with the class
and its instances with methods, while the structure describes how the data is partitioned into
attributes within an instance. A class may also have a representation (metaobject) at run time,
which provides run time support for manipulating the class-related metadata. In object-oriented
design, a class is the most specific type of an object in relation to a specific layer.
Reasons for using classes
Classes, when used properly, can accelerate development by reducing redundant program code,
testing and bug fixing. If a class has been thoroughly tested and is known to be a solid work, it is
typically the case that using or extending the well-tested class will reduce the number of bugs - as
compared to the use of freshly-developed or ad hoc code - in the final output. In addition, efficient
class reuse means that many bugs need to be fixed in only one place when problems are discovered.
Another reason for using classes is to simplify the relationships of interrelated data. Rather than
writing code to repeatedly call a GUI window drawing subroutine on the terminal screen (as would
be typical for structured programming), it is more intuitive to represent the window as an object and
tell it to draw itself as necessary. With classes, GUI items that are similar to windows (such as
dialog boxes) can simply inherit most of their functionality and data structures from the window
class. The programmer then need only add code to the dialog class that is unique to its operation.
Indeed, GUIs are a very common and useful application of classes, and GUI programming is
generally much easier with a good class framework.
Instantiation
A class is used to create new instances (objects) by instantiating the class.
Instances of a class share the same set of attributes, yet may differ in what those attributes contain.
For example, a class "Person" would describe the attributes common to all instances of the Person
class. Each person is generally alike, but varies in such attributes as "height" and "weight". The
class would list types of such attributes and also define the actions which a person can perform:
"run", "jump", "sleep", "walk", etc. One of the benefits of programming with classes is that all
instances of a particular class will follow the defined behavior of the class they instantiate.
Structure of a class
134
UML notation for classes
Along with having an interface, a class contains a description of structure of data stored in the
instances of the class. The data is partitioned into attributes (or properties, fields, data members).
Cosidering the television set example, the myriad attributes, such as size and whether it supports
color, together comprise its structure. A class represents the full description of a television,
including its attributes (structure) and buttons (interface).
The state of an instance's data is stored in some resource, such as memory or a file. The storage is
assumed to be located in a specific location, such that it is possible to access the instance through
references to the identity of the instances. However, the actual storage location associated with an
instance may change with time. In such situations, the identity of the object does not change. The
state is encapsulated and every access to the state occurs through methods of the class.
In most languages, the structures as defined by the class determine how the memory used by its
instances will be laid out. This technique is known as the cookie-cutter model. The alternative to
the cookie-cutter model is the model of Python, wherein objects are structured as associative keyvalue
containers. In such models, objects that are instances of the same class could contain different
instance variables, as state can be dynamically added to the object. This may resemble prototypebased
languages in some ways, but it is not equivalent.
A class also describes a set of invariants that are preserved by every method in the class. An
invariant is a constraint on the state of an object that should be satisfied by every object of the class.
The main purpose of the invariants is to establish what objects belong to the class. An invariant is
what distinguishes data types and classes from each other; that is, a class does not allow use of all
possible values for the state of the object, and instead allows only those values that are well-defined
by the semantics of the intended use of the data type. The set of supported (public) methods often
implicitly establishes an invariant. Some programming languages support specification of
invariants as part of the definition of the class, and enforce them through the type system.
Encapsulation of state is necessary for being able to enforce the invariants of the class.
Some languages allow an implementation of a class to specify constructor (or initializer) and
destructor (or finalizer) methods that specify how instances of the class are created and destroyed,
respectively. A constructor that takes arguments can be used to create an instance from passed-in
data. The main purpose of a constructor is to establish the invariant of the class, failing if the
invariant isn't valid. The main purpose of a destructor is to destroy the identity of the instance,
invalidating any references in the process. Constructors and destructors are often used to reserve
and release, respectively, resources associated with the object. In some languages, a destructor can
return a value which can then be used to obtain a public representation (transfer encoding) of an
135
instance of a class and simultaneously destroy the copy of the instance stored in current thread's
memory.
A class may also contain static attributes or class attributes, which contain data that are specific to
the class yet are common to all instances of the class. If the class itself is treated as an instance of a
hypothetical metaclass, static attributes and static methods would be instance attributes and instance
methods of that metaclass.
Run-time representation of classes
As a data type, a class is usually considered as a compile-time construct. A language may also
support prototype or factory metaobjects that represent run-time information about classes, or even
represent metadata that provides access to reflection facilities and ability to manipulate data
structure formats at run-time. Many languages distinguish this kind of run-time type information
about classes from a class on the basis that the information is not needed at run-time. Some
dynamic languages do not make strict distinctions between run-time and compile-time constructs,
and therefore may not distinguish between metaobjects and classes.
For example: if Human is a metaobject representing the class Person, then instances of class Person
can be created by using the facilities of the Human metaobject.
5.11 Metaclass
In object-oriented programming, a metaclass is a class whose instances are classes. Just as an
ordinary class defines the behavior of certain objects, a metaclass defines the behavior of certain
classes and their instances. Not all object-oriented programming languages support metaclasses.
Among those which do, the extent to which metaclasses can override any given aspect of class
behavior varies. Each language has its own metaobject protocol, a set of rules which govern how
objects, classes, and metaclasses interact.
Python example
In Python, the builtin class type is a metaclass. Let us consider this simple Python class:
class Car(object):
__slots__ = [‘make’, ‘model’, ‘year’, ‘color’]
def __init__(self, make, model, year, color):
self.make = make
self.model = model
self.year = year
self.color = color
@property
def description(self):
“”” Return a description of this car. “””
return “%s %s %s %s” % (self.color, self.year, self.make, self.model)
At run time, Car itself is a type object. The source code of the Car class, shown above, does not
include such details as the size in bytes of Car objects, their binary layout in memory, how they are
allocated, that the __init__ method is automatically called each time a Car is created, and so on.
136
These details come into play not only when a new Car object is created, but also each time any
attribute of a Car is accessed. In languages without metaclasses, these details are defined by the
language specification and can’t be overridden. In Python, the metaclass, type, controls these
details of Car’s behavior. They can be overridden by using a different metaclass instead of type.
The above example contains some redundant code to do with the four attributes make, model, year,
and color. It is possible to eliminate some of this redundancy using a metaclass. In Python, a
metaclass is most easily defined as a subclass of type.
Class AttributeInitType(type):
def __call__(self, *args, **kwargs):
“”” Create a new instance. “””
# First, create the object in the normal default way.
Obj = type.__call__(self, *args)
# Additionally, set attributes on the new object.
For name in kwargs:
setattr(obj, name, kwargs[name])
# Return the new object.
Return obj
This metaclass only overrides object creation. All other aspects of class and object behavior are still
handled by type.
Now the class Car can be rewritten to use this metaclass. This is done in Python 2 by assigning to
__metaclass__ within the class definition (in Python 3.0, you inherit from metaclass=M instead):
class Car(object):
__metaclass__ = AttributeInitType
__slots__ = [‘make’, ‘model’, ‘year’, ‘color’]
@property
def description(self):
“”” Return a description of this car. “””
return “%s %s %s %s” % (self.color, self.year, self.make, self.model)
Car objects can then be instantiated like this:
cars = [
Car(make=’Toyota’, model=’Prius’, year=2005, color=’green’),
Car(make=’Ford’, model=’Prefect’, year=1979, color=’blue’)]
Metaclass programming can be confusing, and it is rare in real-world Python code.
In Smalltalk-80
In Smalltalk, everything is an object. There are two kinds of objects: those which can create
instances of themselves (classes), and others which cannot. Every object is the instance of a class.
Every class is the instance of a metaclass.
137
In early Smalltalks, there was one metaclass called Class. The object creation method of all classes
was the same, i.e., new. A class sent the message new could only return an object with uninitialized
instance variables. Smalltalk’s designers wanted to send one message to an object to initiate both
creation and initializaton. They achieved this in Smalltalk-80.
In Smalltalk-80, a class is an instance of its own metaclass; and each class can have unique
methods for creating objects. Metaclasses, like other classes, contain methods used by their
instances. But metaclasses are all instances of one class, called Metaclass. Unlike classes,
metaclasses do not need flexibile creation methods, because classes all have the same structure. For
instance, the class Car has instance variables just like any other class. People using (and not redesigning)
Smalltalk do not need to write class creation methods.
Names are not given to metaclasses. The metaclass of class Sphere is simply referred to as “the
metaclass of class Sphere”. The metaclass of a class may be accessed by sending the message class
to the class.
The methods of a metaclass create instances, and initialize class variables.
In Smalltalk-80, every class (except Object) has a superclass. The abstract superclass of all
metaclasses is Class, which describes the general nature of classes.
The superclass hierarchy for metaclasses parallels that for classes, except for class Object. ALL
metaclasses are subclasses of Class, therefore:
Object class superclass == Class.
Like conjoined twins, classes and metaclasses are born together. Metaclass has an instance variable
thisClass, which points to its conjoined class.
The names of classes in the metaclass hierarchy are easily confused with the concepts of the same
name. For instance:
Object is the base class which provides common methods for all objects; “an object” is an
integer, or a widget, or a Car, etc.
Class is the base metaclass which provides common methods for all classes; “a class” is
something like Integer, or Widget, or Car, etc.
Metaclass has the same relation to “a Metaclass”.
Four classes provide the facilities to describe new classes. Their inheritance hierarchy (from
Object), and the main facilities they provide are:
Object – default behavior common to all objects, like class access
Behavior – minimum state for compiling methods and creating/running objects
ClassDescription (abstract class) – class/variable naming, comments
Class – similar, more comprehensive, facilities to superclasses
Metaclass – initializing class variables, instance creation messages + read on…
Class methods actually belong to the metaclass, just as instance methods actually belong to the
class. When a message is sent to the object 2, the search for the method starts in Integer. If it not
found it proceeds up the superclass chain, stopping at Object whether it is found or not.
138
Aside – another way of saying “metaclass of Integer” is Integer class.
When a message is sent to Integer the search for the method starts in Integer class and proceeds up
the superclass chain to Object class. Note that, so far, the metaclass inheritance chain exactly
follows that of the class inheritance chain. But the metaclass chain extends further because Object
class is the subclass of Class. All metaclasses are subclasses of Class.
All metaclasses are instances of class Metaclass. So the metaclass of Metaclass is an instance of
Metaclass.
5.12 Is-a Relationship
Above figure shows a parent class and a child class. The line between them shows the “is-a”
relationship. The arrow points to the parent class from the child class. The picture can be read as
“a Ford is-a automobile.”
The phrase “is-a” is in common use in computer science. The arrow between a child and parent is
sometimes called an “is-a link”.
The ovals represent classes. The picture does not show objects.
5.13 Inheritance
In object-oriented programming, inheritance is a way to form new classes (instances of which are
called objects) using classes that have already been defined. Inheritance is intended to help reuse
existing code with little or no modification. The new classes, known as derived classes, inherit
attributes and behavior of the pre-existing classes, which are referred to as base classes (or ancestor
classes). The inheritance relationship of sub- and superclasses gives rise to a hierarchy.
Inheritance should not be confused with (subtype) polymorphism, commonly called just
polymorphism in object-oriented programming. Inheritance is a relationship between
implementations, whereas subtype polymorphism is relationship between types (interfaces in
OOP). In some, but not all OOP languages, the notions coincide because the only way to declare a
subtype is to define a new class that inherits the implementation of another. Inheritance does not
entail behavioral subtyping either. It is entirely possible to derive a class whose object will behave
Child Class
Parent Class
Is_a
Automobile
Ford
139
incorrectly when used in a context where the parent class is expected; see the Liskov substitution
principle.
Complex inheritance, or inheritance used within a design that is not sufficiently mature, may lead to
the Yo-yo problem.
Inheritance is between classes, not between objects.
A parent class is a blueprint that is followed when an object is constructed. A child class of the
parent is another blueprint (that looks much like the original), but with added features. The child
class is used to construct objects that look like the parent’s objects, but with added features.
When to Use Inheritance
Inheritance is a useful programming concept, but it is easy to use inappropriately. Often interfaces
do the job better.
Inheritance is a good choice when:
Your inheritance hierarchy represents an “is-a” relationship and not a “has-a” relationship.
You can reuse code from the base classes.
You need to apply the same class and methods to different data types.
The class hierarchy is reasonably shallow, and other developers are not likely to add many
more levels.
You want to make global changes to derived classes by changing a base class.
Applications of Inheritance
There are several reasons to use inheritance.
1. Specialization
One common reason to use inheritance is to create specializations of existing classes or objects. In
specialization, the new class or object has data or behavior aspects that are not part of the inherited
class. For example, a "Bank Account" class might have data for an "account number", "owner", and
"balance". An "Interest Bearing Account" class might inherit "Bank Account" and then add data for
"interest rate" and "interest accrued" along with behavior for calculating interest earned.
Another form of specialization occurs when a base class specifies that it has a particular behavior
but does not actually implement the behavior. Each non-abstract, concrete class which inherits from
that abstract class must provide an implementation of that behavior. This providing of actual
behavior by a subclass is sometimes known as implementation or reification.
2. Overriding
140
Many object-oriented programming languages permit a class or object to replace the
implementation of an aspect—typically a behavior—that it has inherited. This process is usually
called overriding. Overriding introduces a complication: which version of the behavior does an
instance of the inherited class use—the one that is part of its own class, or the one from the parent
(base) class? The answer varies between programming languages, and some languages provide the
ability to indicate that a particular behavior is not to be overridden and behave.
3. Code re-use
One of the earliest motivations for using inheritance was to allow a new class to re-use code which
already existed in another class. This practice is usually called implementation inheritance.
In most quarters, class inheritance for the sole purpose of code re-use has fallen out of favor. The
primary concern is that implementation inheritance does not provide any assurance of polymorphic
substitutability—an instance of the re-using class cannot necessarily be substituted for an instance
of the inherited class. An alternative technique, delegation, requires more programming effort but
avoids the substitutability issue. In C++ private inheritance can be used as form of implementation
inheritance without substitutability. Whereas public inheritance represents an "is-a" relationship
and delegation represents a "has-a" relationship, private (and protected) inheritance can be thought
of as an "is implemented in terms of" relationship.
Limitations and Alternatives
When using inheritance extensively in designing a program, one should be aware of certain
constraints that it imposes.
For example, consider a class Person that contains a person's name, address, phone number, age,
gender, and race. We can define a subclass of Person called Student that contains the person's
grade point average and classes taken, and another subclass of Person called Employee that
contains the person's job title, employer, and salary.
In defining this inheritance hierarchy we have already defined certain restrictions, not all of which
are desirable:
Constraints of Inheritance-based Design
1. Singleness:
using single inheritance, a subclass can inherit from only one superclass. Continuing the
example given above, Person can be either a Student or an Employee, but not both. Using
multiple inheritance partially solves this problem, as a StudentEmployee class can be
defined that inherits from both Student and Employee. However, it can still inherit from
each superclass only once; this scheme does not support cases in which a student has two
jobs or attends two institutions.
2. Static:
the inheritance hierarchy of an object is fixed at instantiation when the object's type is
selected and does not change with time. For example, the inheritance graph does not allow a
Student object to become a Employee object while retaining the state of its Person
superclass. (Although similar behavior can be achieved with the decorator pattern.) Some
have criticized inheritance, contending that it locks developers into their original design
standards.
141
3. Visibility:
whenever client code has access to an object, it generally has access to all the object's
superclass data. Even if the superclass has not been declared public, the client can still cast
the object to its superclass type. For example, there is no way to give a function a pointer to
a Student's grade point average and transcript without also giving that function access to all
of the personal data stored in the student's Person superclass.
142
Check Your Progress
1. What are sementic networks? State briefly different kinds of sementic networks.
2. Sketch the conceptual graph representation of the relation of brother to a sister.
3. Briefly describe the following knowledge representation schemes alongwith their merits and
demerits:
(i) Predicate logic
(ii) Frames
143

Chapter 6 Expert System


6.1 Expert Systems and Artificial Intelligence
Expert Systems are computer programs that are derived from Artificial Intelligence (AI).
An expert system is software that attempts to provide an answer to a problem. Expert systems are
most common in a specific problem domain, and is a traditional application and/ or subfield of
artificial intelligence. A number of methods can be used to simulate the performance of the expert
system however most common among all are:
1) the creation of a so-called "knowledgebase" which uses some knowledge representation
formalism to capture the Subject Matter Expert's (SME) knowledge and
2) a process of gathering that knowledge from the SME and codifying it according to the
formalism, which is called knowledge engineering.
Expert systems may or may not have learning components but a third common element is that once
the system is developed it is proven by being placed in the same real world problem solving
situation as the human SME, typically as an aid to human workers or a supplement to some
information system.
As a premiere application of computing and artificial intelligence, the topic of expert systems has
many points of contact with general systems theory, operations research, business process reengineering
and various topics in applied mathematics and management science.
AI's scientific goal is to understand intelligence by building computer programs that exhibit
intelligent behavior. It is concerned with the concepts and methods of symbolic inference, or
reasoning, by a computer, and how the knowledge used to make those inferences will be
represented inside the machine.
The term intelligence covers many cognitive skills, including the ability to solve problems, learn,
and understand language; AI addresses all of those. But most progress to date in AI has been made
in the area of problem solving; concepts and methods for building programs that reason about
problems rather than calculate a solution.
Knowledge Based Systems (KBS) or Expert Systems
AI programs that achieve expert-level competence in solving problems in task areas by bringing to
bear a body of knowledge about specific tasks are called knowledge-based or expert systems. Often,
the term expert systems is reserved for programs whose knowledge base contains the knowledge
used by human experts, in contrast to knowledge gathered from textbooks or non-experts. More
often than not, the two terms, expert systems (ES) and knowledge-based systems (KBS) are used
synonymously. Taken together, they represent the most widespread type of AI application. The area
of human intellectual endeavor to be captured in an expert system is called the task domain. Task
refers to some goal-oriented, problem-solving activity. Domain refers to the area within which the
144
task is being performed. Typical tasks are diagnosis, planning, scheduling, configuration and
design. An example of a task domain is aircraft crew scheduling.
Building an expert system is known as knowledge engineering and its practitioners are called
knowledge engineers. The knowledge engineer must make sure that the computer has all the
knowledge needed to solve a problem. The knowledge engineer must choose one or more forms in
which to represent the required knowledge as symbol patterns in the memory of the computer i.e.
he must choose a knowledge representation. He must also ensure that the computer can use the
knowledge efficiently by selecting from a handful of reasoning methods.
The Building Blocks of Expert Systems
Every expert system consists of two principal parts: the knowledge base; and the reasoning, or
inference, engine.
The knowledge base of expert systems contains both factual and heuristic knowledge. Factual
knowledge is that knowledge of the task domain that is widely shared, typically found in textbooks
or journals, and commonly agreed upon by those knowledgeable in the particular field.
Heuristic knowledge is the less rigorous, more experiential, more judgmental knowledge of
performance. In contrast to factual knowledge, heuristic knowledge is rarely discussed, and is
largely individualistic. It is the knowledge of good practice, good judgment, and plausible
reasoning in the field. It is the knowledge that underlies the "art of good guessing."
Production Rule
Knowledge representation formalizes and organizes the knowledge. One widely used
representation is the production rule, or simply rule. A rule consists of an IF part and a THEN part
(also called a condition and an action). The IF part lists a set of conditions in some logical
combination. The piece of knowledge represented by the production rule is relevant to the line of
reasoning being developed if the IF part of the rule is satisfied; consequently, the THEN part can be
concluded, or its problem-solving action taken. Expert systems whose knowledge is represented in
rule form are called rule-based systems.
Frame
Another widely used representation, called the unit (also known as frame, schema, or list structure)
is based upon a more passive view of knowledge. The unit is an assemblage of associated symbolic
knowledge about an entity to be represented. Typically, a unit consists of a list of properties of the
entity and associated values for those properties.
Since every task domain consists of many entities that stand in various relations, the properties can
also be used to specify relations, and the values of these properties are the names of other units that
are linked according to the relations. One unit can also represent knowledge that is a "special case"
of another unit, or some units can be "parts of" another unit.
The problem-solving model, or paradigm, organizes and controls the steps taken to solve the
problem. One common but powerful paradigm involves chaining of IF-THEN rules to form a line
of reasoning. If the chaining starts from a set of conditions and moves toward some conclusion, the
method is called forward chaining. If the conclusion is known (for example, a goal to be achieved)
but the path to that conclusion is not known, then reasoning backwards is called for, and the method
145
is backward chaining. These problem-solving methods are built into program modules called
inference engines or inference procedures that manipulate and use knowledge in the knowledge
base to form a line of reasoning.
The knowledge base an expert uses is what he learned at school, from colleagues, and from years of
experience. Presumably the more experience he has, the larger his store of knowledge. Knowledge
allows him to interpret the information in his databases to advantage in diagnosis, design, and
analysis.
Though an expert system consists primarily of a knowledge base and an inference engine, a couple
of other features are worth mentioning: reasoning with uncertainty, and explanation of the line of
reasoning.
Knowledge is almost always incomplete and uncertain. To deal with uncertain knowledge, a rule
may have associated with it a confidence factor or a weight. The set of methods for using uncertain
knowledge in combination with uncertain data in the reasoning process is called reasoning with
uncertainty. An important subclass of methods for reasoning with uncertainty is called "fuzzy
logic," and the systems that use them are known as "fuzzy systems."
Because an expert system uses uncertain or heuristic knowledge (as we humans do) its credibility is
often in question. When an answer to a problem is questionable, we tend to want to know the
rationale. If the rationale seems plausible, we tend to believe the answer. So it is with expert
systems. Most expert systems have the ability to answer questions of the form: "Why is the answer
X?" Explanations can be generated by tracing the line of reasoning used by the inference engine.
The most important ingredient in any expert system is knowledge. The power of expert systems
resides in the specific, high-quality knowledge they contain about task domains. AI researchers will
continue to explore and add to the current repertoire of knowledge representation and reasoning
methods. But in knowledge resides the power. Because of the importance of knowledge in expert
systems and because the current knowledge acquisition method is slow and tedious, much of the
future of expert systems depends on breaking the knowledge acquisition bottleneck and in
codifying and representing a large knowledge infrastructure.
Knowledge Engineering
Knowledge engineering is the art of designing and building expert systems, and knowledge
engineers are its practitioners. A knowledge engineer is a computer scientist who knows how to
design and implement programs that incorporate artificial intelligence techniques. The nature of
knowledge engineering is changing, however, and a new breed of knowledge engineers is
emerging.
Today there are two ways to build an expert system. They can be built from scratch, or built using a
piece of development software known as a "tool" or a "shell."
Now the question arises; what knowledge engineers do? Though different styles and methods of
knowledge engineering exist, the basic approach is the same: a knowledge engineer interviews and
observes a human expert or a group of experts and learns what the experts know, and how they
reason with their knowledge. The engineer then translates the knowledge into a computer-usable
language, and designs an inference engine, a reasoning structure, that uses the knowledge
appropriately. He also determines how to integrate the use of uncertain knowledge in the reasoning
process, and what kinds of explanation would be useful to the end user.
146
Next, the inference engine and facilities for representing knowledge and for explaining are
programmed, and the domain knowledge is entered into the program piece by piece. It may be that
the inference engine is not just right; the form of knowledge representation is awkward for the kind
of knowledge needed for the task; and the expert might decide the pieces of knowledge are wrong.
All these are discovered and modified as the expert system gradually gains competence.
The discovery and cumulation of techniques of machine reasoning and knowledge representation is
generally the work of artificial intelligence research. The discovery and cumulation of knowledge
of a task domain is the province of domain experts. Domain knowledge consists of both formal,
textbook knowledge, and experiential knowledge; the expertise of the experts.
6.2 Justification of Expert System
There has been a significant increase in the applications of expert systems over the past several
years. Much of the interest has been in the actual development of expert systems with relatively
little study of the economic justification problem.
The justification of an expert system is similar to the justification of any advanced information
technology where quantification of the specific cost improvements may be difficult. The cost
justification for such a technology is not likely to be determined by simple evaluation of its impact
on the basic cost categories of direct labor and material.
In maintenance operations, an expert system's cost justification is more likely to be determined by
its effectiveness in the more subtle areas of cost such as reduction in inventories, raw material work
in progress, reduced transition costs, and reduction in rework and rehandling. The use of a
technology tool should result in real cash flow savings in categories normally maintained in
accounting systems and also in some categories not maintained in accounting systems.
A reduction in the need for capital equipment and manpower directly translates into cost savings
which improve the overall cash flow profile of the organization. The consistency, promptness, and
accuracy of decisions offered by expert systems facilitate higher utilization of resources with more
consistent results. Machine times that previously had been idle could be put to productive use.
Many production facilities have inherent flexibilities that can only be identified through automated
information tools such as expert systems. With its capability to process large amounts of data, an
expert system could point out areas where flexibilities such as either equipment substitution or
material replacement are possible without jeopardizing product quality.
A Case Study of Economic Justification of Expert Systems in
Maintenance Operations
The case study we are taken here involves the economic justification of an expert system for
assisting maintenance personnel in troubleshooting and diagnosing a continuous miner equipment
in an underground coal mine. The conventional economic analysis of the expert system is extended
to a multiattribute utility analysis. Simulation was used to study the potential impact of the
proposed expert system on the maintenance operations.
The Equipment: Continuous Miner
147
The adverse conditions in an underground mine greatly complicate maintenance operations. Long
travel distances from shop or central dispatching area to mining faces and the available
transportation means reduce effective time for actual repair work. Parts movement times are also
long. Poor lighting and dirty conditions at mining faces make repair work difficult. Because of
these adverse conditions, it is often difficult to maintain a high skill level for maintenance
personnel.
The supervisor of training at a large underground coal mine indicated that it takes about five years
for the average repair person to reach the high level of competence necessary to qualify him as an
expert troubleshooter on the variety of equipment he is required to service. Many never reach this
level of skill and the exceptional ones may reach it in two years. It was further stated that, because
people tended to transfer or leave the underground maintenance once they had skills and seniority
to do so, only about ten to fifteen percent of the maintenance personnel possessed this level of
performance.
A diagnostic expert system could effectively improve maintenance performance since the personnel
depend heavily on technical manuals supplied by equipment manufacturers. The following is an
example of how we might quantify the potential benefits of the expert system. The first step is to
analyze what factors would be influenced by the installation of the system. Examples are:
1. The overall reliability of the machine could be improved if there was an expert system which
could perform more accurate diagnosis that would reduce the number of repeat calls since the
repairs would be more likely to be done correctly the first time.
2. Mechanics would be better able to assess what tools and parts are required and, thus, reduce
return trips or waiting time to get the parts or tools needed.
3. The parts checked out of the warehouse would be more likely to be the ones needed, thereby,
reducing the small but real cost of used parts being lost and not returned.
4. Operating time per shift could be increased. Thus, the production for the shift would be
increased. Since the production rate of the mine is fixed by contract, this would reduce the number
of unit shifts required to be worked.
5. The continuous miner is the heart of a mining section operation. When it is down, it is estimated
that fifty percent of the productive work in the section ceases. If the delay is expected to be
extensive, the crew is reassigned. However, in shorter delays, less than half a shift, it constitutes a
loss.
6. Downtime includes waiting for a mechanic to be assigned to the call when the work load is
heavy.
7. If the maintenance functions could be made less tedious with the aid of an expert system,
experienced workers will be more likely to stay on the job longer.
The factors associated with the above cost items include machine reliability, personnel
productivity, tool utilization, effective operating time, personnel idle time, machine idle time, and
personnel retention. These factors may be included in a comprehensive multiattribute evaluation of
the expert system. None of the data required to analyze these cost impacts can be simply looked up
in cost statements or monthly reports.
148
The maintenance records from the work order system do, however, provide information on the
number, duration, and nature of reported breakdowns. Other maintenance records include the
number of operating hours during the month. From this information, the mean, standard deviation,
maximum, and minimum failure times can be computed. A truncated normal distribution is
assumed for failure time. Mean time between failures can be obtained by dividing the total
operating time by the number of failures reported. Also the average production per operating hour
can be calculated. The distribution of time between failures is assumed to be exponential. Other
necessary data would be obtained from database or operating and maintenance schedules as
follows:
* Continuous miners are normally operated two consecutive shifts and idled for one shift for
maintenance. Thus, the maximum time between failures which would concern us is two shifts.
* The effective work time in a shift after excluding lunch, travel, and minor operational delay is
about 380 minutes.
* The probability of successful repair without call backs is estimated by counting the number of
times in the total delay records that a call was made for the same failure in the same shift.
* The return for parts and tools could be estimated by foremen reviewing their logs and discussions
with mechanics and supervisors.
* Value of parts lost was taken from an audit done by the warehouse for the purpose of determining
this value.
* On a typical operating shift, three continuous miner crews are scheduled to operate. Two crews of
mechanics dispatched from a central shop are assigned to service the operation.
The Expert System Model
The proposed expert system was needed because it would provide significant benefits for the
maintenance operations. Expert systems, in general, facilitate an environment where the scarce
expertise of skilled workers can be widely distributed through the use of computers. For the
maintenance operation, the proposed expert system offered the following benefits:
* It would increase the frequency and consistency of successful repair jobs.
* It would help distribute the expertise of the maintenance personnel.
* It would facilitate real-time expert-level repair decisions by regular maintenance personnel.
* It would increase the utilization of the dormant information available in the corporate data base.
* It would free up the time of the highly skilled maintenance personnel, thus, permitting them to
focus on more difficult maintenance issues. The design of the system was based on a knowledge
representation scheme using a combination of rules and frames. Rules represent links between
variables in terms of IF-Then relationships.
The cause and effect relationships in maintenance operations are best represented in terms of rules
in one of the formats presented below:
149
IF data THEN condition
IF condition THEN action
IF action THEN consequence
IF consequence THEN goal
The historical maintenance data available in the corporate data base constitutes the driving force for
the expert system. The frame representation component of the system is required to provide a basis
for organizing related pieces of information. A frame consists of a collection of slots that contain
attributes to describe an object, a situation, an action, or an event. For example, a maintenance call
is regarded as an event that is associated with a set of objects, a list of attributes, and a collection of
rules. These related items are best organized as frames.
Knowledge acquisition for the expert system was planned to be done through several related means.
These included the following:
* Direct consultation with maintenance experts
* Use of existing maintenance handbooks and logs
* Direct observation of maintenance operations
The cost of the time of consulting with the maintenance experts was included as a part of the initial
development cost of the expert system. Several knowledge acquisition sessions (meetings) were
included in the plan. Interviews and questionnaires were included in the acquisition process.
Because multiple experts were expected to be involved in the knowledge acquisition, the Delphi
method and the nominal group technique commonly used in group decision making were
recommended for use in the knowledge acquisition process. The information provided by the
various experts were expected to be organized into a blackboard architecture. Blackboard
architecture is a special type of expert system knowledge organization whereby multiple sources of
knowledge contribute to the problem solution procedure.
Development Cost
The initial development cost of the expert system was estimated to be $200,000. This included the
cost of purchasing the development shell and the personnel time needed for knowledge acquisition
and coding. Several technical and management personnel were expected to participate in the
knowledge acquisition process. The actual development was expected to be carried out by a group
of technical personnel. The end users (maintenance personnel) were also included in the knowledge
acquisition and testing processes. The total cost of the time of all those expected to be involved in
the expert system development was included as a part of the initial cost of the proposed expert
system. Also included in the initial cost was the cost of hiring an outside consultant during the
design stage of the expert system.
Cost of Hardware
The hardware recommended for the implementation of the proposed expert system was the Texas
Instrument Explorer. This is a special Al workstation developed to run the LISP programming
language. It is often referred to as a LISP Machine. The chosen development shell (Personal
Consultant Plus) was developed specifically to run on the Explorer system although it could also
150
run on IBM compatible PCs. The cost of the Explorer system was $40,000. Additional costs were
budgeted for the purchase of additional PCs and portable computers for site operation of the expert
system. The total hardware cost for the justification process was $80,000.
Verification and Validation
Guidelines for verifying and validating the proposed expert system were developed as a part of the
justification requirement. Verification involved a determination of whether or not the expert system
was functioning as designed. The guideline for this involved debugging requirements, error
analysis, user friendliness, proper operation, and report format. Validation concerned an evaluation
of how closely the solution offered by the expert system would match human solution. Experts who
provided the knowledge as well as end users were expected to participate in the verification and
validation processes.
Operation of the Expert System
The expert system was evaluated for implementation at both a home office and on-site. Portable
PCs were included in the implementation strategy. These were expected to be carried by
maintenance personnel to problem sites at which the expert system was not permanently installed.
In some cases, operators could call in to a home office to report a problem. A technical personnel
would then immediately consult the expert system to determine what recommendations to relay
back to the operators. The system was to be initially deployed under a pilot arrangement at selected
sites for specific maintenance operations. It was then to be extended to other sites and other
operations. The cost of extending the implementation of the system was not included in the
justification process. Some of the considerations covered by the implementation strategy were:
* Management support
* Personnel responsible for the implementation
* Time frame for the implementation
* Compatibility with other decision support systems in the organization
* Integration with other maintenance support tools
* User training requirements
* Development of a simple user's guide
Maintaining the Expert System
The original copy of the knowledge base for the expert system was expected to be maintained at the
corporate headquarters. Backup copies were to be distributed to operation sites. The company had a
Technical Center adjacent to the corporate headquarters. All changes to the knowledge base would
be made at the technical center. Revised backup copies would then be distributed to the operation
sites. The technical center had a large group of computer software personnel. With management
support, an arrangement was made with the technical center whereby the proposed expert system
would be maintained at no additional cost within the center's regular software maintenance
functions. Thus, a cost of maintaining the proposed expert system was not included in the
justification process. Two of the center's personnel were included in the development team. Both of
151
them felt that it would not be difficult to carry out the software upgrade functions since they would
already be familiar with the system. However, to minimize the burden of maintaining the
knowledge base, it was agreed that upgrades would be performed only four times a year (end of
March, June, September, and December). The maintenance personnel were expected to send their
comments, notes, and problems encountered with the expert system to the technical center
promptly. These pieces of information would be organized and collated pending the next upgrade
of the expert system.
The Simulation Model
Since an expert system was being proposed, it was felt that the only way to evaluate the potential
effects of the system was to use a simulation approach. This eliminated much of the guesswork and
speculations about the potential impact of the proposed expert system. The simulation model was
used to generate inputs for the justification process. It was not intended to be embedded or
interfaced directly with the expert system. The outputs were obtained for the case with the proposed
expert system installation and the case without the expert system.
Decisions based on a simulation analysis will exhibit more validity than decisions based on an
absolute subjective reasoning. Simulation simplifies sensitivity analysis so that an analyst can
quickly determine what changes in parameter values will necessitate a change in decisions. The
potential effects of certain actions can be studied prior to making actual resource and time
commitments.
The premise of the proposed expert system was that it could eliminate much of the time and
material losses faced in the repair operations. The simulation model facilitated an insight into these
potential savings. Maintenance data for a two-month period was used as the basis for the simulation
study. The simulation model was written using the discrete event orientation in SLAM II and it
contained FORTRAN code for simplified report generation. Assumptions were made as to the
expected impacts if the proposed expert system was installed. The simulation was run for one
hundred shifts. Each shift was assumed to be eight hours long. Provisions were made for scheduled
breaks during a shift. SLAM automatically advanced time and kept future events on the event
calendar. Some of the events defined for the simulation run were:
* Occurrence of a maintenance problem
* Initiation of a repair call
* Arrival of a repair personnel at the problem site
* Determination of the relevant cause-effect relationship
* Initiation of the required repair work
* Completion of the repair work
The maintenance personnel was modeled as a scarce resource with variable service times. Tools for
performing various repair works were modeled as resources subject to various levels of availability.
Poisson arrival and exponential service times were assumed for the simulation analysis. The time
between when a problem occurred and when a repair call was actually placed was modeled as a
random variable following the exponential distribution. The number of trips made to locate
appropriate parts or tools for a repair job was modeled as a random variable following the
hypergeometric distribution. The simulation was run for two cases: Without expert systems and
152
with expert system. A best case scenario was also run. The best case is an ideal scenario which
assumed that the expert system would eliminate all repeat repairs and extra trips for parts or tools.
The results of this was used to determine the limit of the potential impact of the proposed expert
system. It was also used as the upper bound for sensitivity analysis. The large simulation sample
size yielded sufficient data from the different assumed distributions to ensure that the distribution
of the average values of the parameters of concern would be approximately normally distributed
based on the central limit theorem. This was important for further statistical analysis of the
simulation results.
* Operating labor loss cost. This is the expected labor cost of having mining personnel idle due to
the downtime of a continuous miner equipment which is waiting for repairs.
* Travel labor cost. This is the expected personnel travel cost to and from repair calls.
* Repair labor cost. This is the expected labor cost associated with time on active repair tasks.
* Get parts labor. This is the expected labor cost for time spent searching for correct parts or repair
tools needed for a repair job.
* Get parts loss. This is the expected cost of parts lost or misplaced during a repair job. The
potential for this loss increases whenever a repair job is incorrectly approached.
* Get tools loss. This is the expected cost of tools lost or misplaced during a repair job.
The overall benefit of the expert system is significant when all factors are combined. For example,
the mean travel cost is not significantly affected by the installation of an expert system while the
total downtime is significantly reduced. In fact, the time needed to consult the expert system
actually increased the repair labor cost. The "Get Tools" cost is significantly improved.
The approach used in the case study combines conventional economic measures with standard
multiattribute utility analysis. In addition, the use of computer simulation for generating input data
for the justification process is illustrated. The case study shows both the practicality and cost
effectiveness of the proposed expert system. While expert systems have been extensively used in
many practical problems, few formal procedures have been developed to evaluate their contribution
to organizational goals. A comprehensive and integrative justification approach must be developed
and firmly followed.
After the justification process, management must act quickly. Justification is one thing, actual
implementation is a completely different thing. There must be a quick and smooth transition from
problem identification and solution justification to actual solution implementation.
6.3 Chaining
There are two main methods of reasoning when using inference rules: backward chaining and
forward chaining.
Forward chaining starts with the data available and uses the inference rules to conclude more data
until a desired goal is reached. An inference engine using forward chaining searches the inference
rules until it finds one in which the if clause is known to be true. It then concludes the then clause
and adds this information to its data. It would continue to do this until a goal is reached. Because
the data available determines which inference rules are used, this method is also called data driven.
153
Backward chaining starts with a list of goals and works backwards to see if there is data which will
allow it to conclude any of these goals. An inference engine using backward chaining would search
the inference rules until it finds one which has a then clause that matches a desired goal. If the if
clause of that inference rule is not known to be true, then it is added to the list of goals. For
example, suppose a rule base contains
1. If Fritz is green then Fritz is a frog.
2. If Fritz is a frog then Fritz hops.
Suppose a goal is to conclude that Fritz hops. The rule base would be searched and rule (2) would
be selected because its conclusion (the then clause) matches the goal. It is not known that Fritz is a
frog, so this "if" statement is added to the goal list. The rule base is again searched and this time
rule (1) is selected because its then clause matches the new goal just added to the list. This time, the
if clause (Fritz is green) is known to be true and the goal that Fritz hops is concluded. Because the
list of goals determines which rules are selected and used, this method is called goal driven.
Certainty Factors (Confidences)
One advantage of expert systems over traditional methods of programming is that they allow the
use of "confidences", also known as certainty factors. A human, when reasoning, does not always
conclude things with 100% confidence: he might venture, "If Fritz is green, then he is probably a
frog" (after all, he might be a chameleon). This type of reasoning can be imitated by using numeric
values called confidences. For example, if it is known that Fritz is green, it might be concluded with
0.85 confidence that he is a frog; or, if it is known that he is a frog, it might be concluded with 0.95
confidence that he hops. These numbers are similar in nature to probabilities, but they are not the
same: they are meant to imitate the confidences humans use in reasoning rather than to follow the
mathematical definitions used in calculating probabilities.
6.4 Expert System Architecture
The following general points about expert systems and their architecture have been illustrated.
1. The sequence of steps taken to reach a conclusion is dynamically synthesized with each new
case. It is not explicitly programmed when the system is built.
2. Expert systems can process multiple values for any problem parameter. This permits more
than one line of reasoning to be pursued and the results of incomplete reasoning to be
presented.
3. Problem solving is accomplished by applying specific knowledge rather than specific
technique. This is a key idea in expert systems technology. It reflects the belief that human
experts do not process their knowledge differently from others, but they do possess different
knowledge. With this philosophy, when one finds that their expert system does not produce
the desired results, work begins to expand the knowledge base, not to re-program the
procedures.
There are various expert systems in which a rulebase and an inference engine cooperate to simulate
the reasoning process that a human expert pursues in analyzing a problem and arriving at a
conclusion. In these systems, in order to simulate the human reasoning process, a vast amount of
knowledge needed to be stored in the knowledge base. Generally, the knowledge base of such an
expert system consisted of a relatively large number of "if then" type of statements that were
154
interrelated in a manner that, in theory at least, resembled the sequence of mental steps that were
involved in the human reasoning process.
Because of the need for large storage capacities and related programs to store the rulebase, most
expert systems have, in the past, been run only on large information handling systems. Recently,
the storage capacity of personal computers has increased to a point where it is becoming possible to
consider running some types of simple expert systems on personal computers.
In some applications of expert systems, the nature of the application and the amount of stored
information necessary to simulate the human reasoning process for that application is just too vast
to store in the active memory of a computer. In other applications of expert systems, the nature of
the application is such that not all of the information is always needed in the reasoning process. An
example of this latter type application would be the use of an expert system to diagnose a data
processing system comprising many separate components, some of which are optional. When that
type of expert system employs a single integrated rulebase to diagnose the minimum system
configuration of the data processing system, much of the rulebase is not required since many of the
components which are optional units of the system will not be present in the system. Nevertheless,
earlier expert systems require the entire rulebase to be stored since all the rules were, in effect,
chained or linked together by the structure of the rulebase.
When the rulebase is segmented, preferably into contextual segments or units, it is then possible to
eliminate portions of the Rulebase containing data or knowledge that is not needed in a particular
application. The segmenting of the rulebase also allows the expert system to be run with systems or
on systems having much smaller memory capacities than was possible with earlier arrangements
since each segment of the rulebase can be paged into and out of the system as needed. The
segmenting of the rulebase into contextual segments requires that the expert system manage various
intersegment relationships as segments are paged into and out of memory during execution of the
program. Since the system permits a rulebase segment to be called and executed at any time during
the processing of the first rulebase, provision must be made to store the data that has been
accumulated up to that point so that at some time later in the process, when the system returns to
the first segment, it can proceed from the last point or rule node that was processed. Also, provision
must be made so that data that has been collected by the system up to that point can be passed to the
second segment of the rulebase after it has been paged into the system and data collected during the
processing of the second segment can be passed to the first segment when the system returns to
complete processing that segment.
The user interface and the procedure interface are two important functions in the information
collection process.
6.5 Expert Systems versus Problem-Solving Systems
The principal distinction between expert systems and traditional problem solving programs is the
way in which the problem related expertise is coded. In traditional applications, problem expertise
is encoded in both program and data structures. In the expert system approach all of the problem
related expertise is encoded in data structures only; no problem-specific information is encoded in
the program structure. This organization has several benefits.
An example may help contrast the traditional problem solving program with the expert system
approach. The example is the problem of tax advice. In the traditional approach data structures
describe the taxpayer and tax tables, and a program in which there are statements representing an
expert tax consultant's knowledge, such as statements which relate information about the taxpayer
155
to tax table choices. It is this representation of the tax expert's knowledge that is difficult for the tax
expert to understand or modify.
In the expert system approach, the information about taxpayers and tax computations is again found
in data structures, but now the knowledge describing the relationships between them is encoded in
data structures as well. The programs of an expert system are independent of the problem domain
(taxes) and serve to process the data structures without regard to the nature of the problem area they
describe. For example, there are programs to acquire the described data values through user
interaction, programs to represent and process special organizations of description, and programs to
process the declarations that represent semantic relationships within the problem domain and an
algorithm to control the processing sequence and focus.
The general architecture of an expert system involves two principal components: a problem
dependent set of data declarations called the knowledge base or rule base, and a problem
independent (although highly data structure dependent) program which is called the inference
engine.
Individuals involved with Expert Systems
There are generally three individuals having an interaction with expert systems.
1. Primary among these is the end-user; the individual who uses the system for its problem
solving assistance.
2. The problem domain expert who builds and supplies the knowledge base providing the
domain expertise, and
3. A knowledge engineer who assists the experts in determining the representation of their
knowledge, enters this knowledge into an explanation module and who defines the inference
technique required to obtain useful problem solving activity. Usually, the knowledge
engineer will represent the problem solving activity in the form of rules which is referred to
as a rule-based expert system. When these rules are created from the domain expertise, the
knowledge base stores the rules of the expert system.
6.6 Inference Rule
An understanding of the "inference rule" concept is important to understand expert systems. An
inference rule is a statement that has two parts, an if clause and a then clause. This rule is what
gives expert systems the ability to find solutions to diagnostic and prescriptive problems. For
example,
If Socrates is a man and all men are motal;
Then we can infer that Socrates is mortal.
An expert system's rulebase is made up of many such inference rules. They are entered as separate
rules and it is the inference engine that uses them together to draw conclusions. Because each rule
is a unit, rules may be deleted or added without affecting other rules (though it should affect which
conclusions are reached). One advantage of inference rules over traditional programming is that
inference rules use reasoning which more closely resemble human reasoning.
156
Thus, when a conclusion is drawn, it is possible to understand how this conclusion was reached.
Furthermore, because the expert system uses knowledge in a form similar to the expert, it may be
easier to retrieve this information from the expert.
Procedure Node Interface
The function of the procedure node interface is to receive information from the procedures
coordinator and create the appropriate procedure call. The ability to call a procedure and receive
information from that procedure can be viewed as simply a generalization of input from the
external world. While in some earlier expert systems external information has been obtained, that
information was obtained only in a predetermined manner so only certain information could
actually be acquired. This expert system, disclosed in the cross-referenced application, through the
knowledge base, is permitted to invoke any procedure allowed on its host system. This makes the
expert system useful in a much wider class of knowledge domains than if it had no external access
or only limited external access.
In the area of machine diagnostics using expert systems, particularly self-diagnostic applications, it
is not possible to conclude the current state of "health" of a machine without some information. The
best source of information is the machine itself, for it contains much detailed information that could
not reasonably be provided by the operator.
The knowledge that is represented in the system appears in the rulebase. In the rulebase described
in the cross-referenced applications, there are basically four different types of objects, with
associated information present.
1. Classes — these are questions asked to the user.
2. Parameters — a parameter is a place holder for a character string which may be a variable
that can be inserted into a class question at the point in the question where the parameter is
positioned.
3. Procedures — these are definitions of calls to external procedures.
4. Rule Nodes — The inferencing in the system is done by a tree structure which indicates the
rules or logic which mimics human reasoning. The nodes of these trees are called rule
nodes. There are several different types of rule nodes.
The rulebase comprises a forest of many trees. The top node of the tree is called the goal node, in
that it contains the conclusion. Each tree in the forest has a different goal node. The leaves of the
tree are also referred to as rule nodes, or one of the types of rule nodes. A leaf may be an evidence
node, an external node, or a reference node.
An evidence node functions to obtain information from the operator by asking a specific question.
In responding to a question presented by an evidence node, the operator is generally instructed to
answer "yes" or "no" represented by numeric values 1 and 0 or provide a value of between 0 and 1,
represented by a "maybe."
Questions which require a response from the operator other than yes or no or a value between 0 and
1 are handled in a different manner.
A leaf that is an external node indicates that data will be used which was obtained from a procedure
call.
A reference node functions to refer to another tree or subtree.
157
A tree may also contain intermediate or minor nodes between the goal node and the leaf node. An
intermediate node can represent logical operations like ‘And’ or ‘Or’.
The inference logic has two functions. It selects a tree to trace and then it traces that tree. Once a
tree has been selected, that tree is traced, depth-first, left to right.
The word "tracing" refers to the action the system takes as it traverses the tree, asking classes
(questions), calling procedures, and calculating confidences as it proceeds.
The selection of a tree depends on the ordering of the trees. The original ordering of the trees is the
order in which they appear in the rulebase. This order can be changed, however, by assigning an
evidence node an attribute "initial" which is described in detail in these applications. The first
action taken is to obtain values for all evidence nodes which have been assigned an "initial"
attribute. Using only the answers to these initial evidences, the rules are ordered so that the most
likely to succeed is evaluated first. The trees can be further re-ordered since they are constantly
being updated as a selected tree is being traced.
It has been found that the type of information that is solicited by the system from the user by means
of questions or classes should be tailored to the level of knowledge of the user. In many
applications, the group of prospective uses is nicely defined and the knowledge level can be
estimated so that the questions can be presented at a level which corresponds generally to the
average user. However, in other applications, knowledge of the specific domain of the expert
system might vary considerably among the group of prospective users.
In many situations the information is already in the system, in a form of which permits the correct
answer to a question to be obtained through a process of inductive or deductive reasoning. The data
previously collected by the system could be answers provided by the user to less complex questions
that were asked for a different reason or results returned from test units that were previously run.
User interface
The function of the user interface is to present questions and information to the user and supply the
user's responses to the inference engine.
Any values entered by the user must be received and interpreted by the user interface. Some
responses are restricted to a set of possible legal answers, others are not. The user interface checks
all responses to insure that they are of the correct data type. Any responses that are restricted to a
legal set of answers are compared against these legal answers. Whenever the user enters an illegal
answer, the user interface informs the user that his answer was invalid and prompts him to correct
it.
6.7 Applications of Expert Systems
Expert systems are designed and created to facilitate tasks in the fields of accounting, medicine,
process control, financial service, production, human resources etc. Indeed, the foundation of a
successful expert system depends on a series of technical procedures and development that may be
designed by certain technicians and related experts.
A good example of application of expert systems in banking area is expert systems for mortgages.
Loan departments are interested in expert systems for mortgages because of the growing cost of
labour which makes the handling and acceptance of relatively small loans less profitable. They also
158
see in the application of expert systems a possibility for standardised, efficient handling of
mortgage loan, and appreciate that for the acceptance of mortgages there are hard and fast rules
which do not always exist with other types of loans.
While expert systems have distinguished themselves in AI research in finding practical application,
their application has been limited. Expert systems are notoriously narrow in their domain of
knowledge—as an amusing example, a researcher used the "skin disease" expert system to
diagnose his rustbucket car as likely to have developed measles—and the systems were thus prone
to making errors that humans would easily spot. Additionally, once some of the mystique had worn
off, most programmers realized that simple expert systems were essentially just slightly more
elaborate versions of the decision logic they had already been using. Therefore, some of the
techniques of expert systems can now be found in most complex programs without any fuss about
them.
An example, and a good demonstration of the limitations of, an expert system used by many people
is the Microsoft Windows operating system troubleshooting software located in the "help" section
in the taskbar menu. Obtaining expert/technical operating system support is often difficult for
individuals not closely involved with the development of the operating system. Microsoft has
designed their expert system to provide solutions, advice, and suggestions to common errors
encountered throughout using the operating systems.
Another 1970s and 1980s application of expert systems; which we today would simply call AI; was
in computer games. For example, the computer baseball games Earl Weaver Baseball and Tony La
Russa Baseball each had highly detailed simulations of the game strategies of those two baseball
managers. When a human played the game against the computer, the computer queried the Earl
Weaver or Tony La Russa Expert System for a decision on what strategy to follow. Even those
choices where some randomness was part of the natural system (such as when to throw a surprise
pitch-out to try to trick a runner trying to steal a base) were decided based on probabilities supplied
by Weaver or La Russa. Today we would simply say that "the game's AI provided the opposing
manager's strategy."
6.8 Advantages of Expert Systems:
1. Provides consistent answers for repetitive decisions, processes and tasks
2. Holds and maintains significant levels of information
3. Encourages organizations to clarify the logic of their decision-making
4. Never "forgets" to ask a question, as a human might
5. Can work round the clock
6. Can be used by the user more frequently
7. A multi-user expert system can serve more users at a time
6.9 Disadvantages of Expert Systems:
1. Lacks common sense needed in some decision making
2. Cannot make creative responses as human expert would in unusual circumstances
3. Domain experts not always able to explain their logic and reasoning
4. Errors may occur in the knowledge base, and lead to wrong decisions
5. Cannot adapt to changing environments, unless knowledge base is changed
Types of Problems solved by Expert Systems
159
Expert systems are most valuable to organizations that have a high-level of know-how experience
and expertise that cannot be easily transferred to other members. They are designed to carry the
intelligence and information found in the intellect of experts and provide this knowledge to other
members of the organization for problem-solving purposes.
Typically, the problems to be solved are of the sort that would normally be tackled by a medical or
other professional. Real experts in the problem domain (which will typically be very narrow, for
instance "diagnosing skin conditions in human teenagers") are asked to provide "rules of thumb" on
how they evaluate the problems, either explicitly with the aid of experienced systems developers, or
sometimes implicitly, by getting such experts to evaluate test cases and using computer programs to
examine the test data and (in a strictly limited manner) derive rules from that. Generally, expert
systems are used for problems for which there is no single "correct" solution which can be encoded
in a conventional algorithm; one would not write an expert system to find shortest paths through
graphs, or sort data, as there are simply easier ways to do these tasks.
Simple systems use simple true/false logic to evaluate data. More sophisticated systems are capable
of performing at least some evaluation, taking into account real-world uncertainties, using such
methods as fuzzy logic. Such sophistication is difficult to develop and still highly imperfect.
Inference Engine or Expert Systems Shells
An inference engine is a computer program that tries to derive answers from a knowledge base. It is
the "brain" that expert systems use to reason about the information in the knowledge base for the
ultimate purpose of formulating new conclusions. Inference engines are considered to be a special
case of reasoning engines, which can use more general methods of reasoning.
A shell is a complete development environment for building and maintaining knowledge-based
applications. It provides a step-by-step methodology, and ideally a user-friendly interface such as a
graphical interface, for a knowledge engineer that allows the domain experts themselves to be
directly involved in structuring and encoding the knowledge. Many commercial shells are
available, one example being eGanges which aims to remove the need for a knowledge engineer.
Architecture
The separation of inference engines as a distinct software component stems from the typical
production system architecture. This architecture relies on a data store, or working memory, serving
as a global database of symbols representing facts or assertions about the problem; on a set of rules
which constitute the program, stored in a rule memory of production memory; and on an inference
engine, required to execute the rules. (Executing rules is also referred to as firing rules.) The
inference engine must determine which rules are relevant to a given data store configuration and
choose which one(s) to apply. The control strategy used to select rules is often called conflict
resolution.
An inference engine has three main elements. They are:
1. An interpreter. The interpreter executes the chosen agenda items by applying the
corresponding base rules.
2. A scheduler. The scheduler maintains control over the agenda by estimating the effects of
applying inference rules in light of item priorities or other criteria on the agenda.
3. A consistency enforcer. The consistency enforcer attempts to maintain a consistent
representation of the emerging solution.
160
The Recognize-Act Cycle
The inference engine can be described as a form of finite state machine with a cycle consisting of
three action states:
1. Match rules,
2. Select rules, and
3. Execute rules.
Rules are represented in the system by a notation called predicate logic.
In the first state, match rules, the inference engine finds all of the rules that are satisfied by the
current contents of the data store. When rules are in the typical condition-action form, this means
testing the conditions against the working memory. The rule matchings that are found are all
candidates for execution: they are collectively referred to as the conflict set. Note that the same rule
may appear several times in the conflict set if it matches different subsets of data items. The pair of
a rule and a subset of matching data items is called an instantiation of the rule.
In many applications, where large volume of data are concerned and/or when performance time
considerations are critical, the computation of the conflict set is a non-trivial problem. Earlier
research work on inference engines focused on better algorithms for matching rules to data. The
Rete algorithm, developed by Charles Forgy, is an example of such a matching algorithm; it was
used in the OPS series of production system languages. Daniel P. Miranker later improved on Rete
with another algorithm, TREAT, which combined it with optimization techniques derived from
relational database systems.
The inference engine then passes along the conflict set to the second state, select rules. In this state,
the inference engine applies some selection strategy to determine which rules will actually be
executed. The selection strategy can be hard-coded into the engine or may be specified as part of
the model. In the larger context of AI, these selection strategies as often referred to as heuristics
following Allen Newell's Unified theory of cognition.
In OPS5, for instance, a choice of two conflict resolution strategies is presented to the programmer.
The LEX strategy orders instantiations on the basis of recency of the time tags attached to their data
items. Instantiations with data items having recently matched rules in previous cycles are
considered with higher priority. Within this ordering, instantiations are further sorted on the
complexity of the conditions in the rule. The other strategy, Mean Ends Analysis (MEA), puts
special emphasis on the recency of working memory elements that match the first condition of the
rule.
Finally the selected instantiations are passed over to the third state, execute rules. The inference
engine executes or fires the selected rules, with the instantiation's data items as parameters. Usually
the actions in the right-hand side of a rule change the data store, but they may also trigger further
processing outside of the inference engine (interacting with users through a graphical user interface
or calling local or remote programs, for instance). Since the data store is usually updated by firing
rules, a different set of rules will match during the next cycle after these actions are performed.
The inference engine then cycles back to the first state and is ready to start over again. This control
mechanism is referred to as the recognize-act cycle. The inference engine stops either on a given
number of cycles, controlled by the operator, or on a quiescent state of the data store when no rules
match the data.
161
Data-driven computation versus procedural control
The inference engine control is based on the frequent reevaluation of the data store states, not on
any static control structure of the program. The computation is often qualified as data-driven or
pattern-directed in contrast to the more traditional procedural control. Rules can communicate with
one another only by way of the data, whereas in traditional programming languages procedures and
functions explicitly call one another. Unlike instructions, rules are not executed sequentially and it
is not always possible to determine through inspection of a set of rules which rule will be executed
first or cause the inference engine to terminate.
In contrast to a procedural computation, in which knowledge about the problem domain is mixed in
with instructions about the flow of control—although object-oriented programming languages
mitigate this entanglement—the inference engine model allows a more complete separation of the
knowledge (in the rules) from the control (the inference engine).
6.10 Knowledge Acquisition
Knowledge acquisition is a method of learning, first proposed by Aristotle in his seminal work
"Organon". Aristotle proposed that the mind at birth is a blank slate, or tabula rasa. As a blank slate
it contains no knowledge of the objective, empirical universe, nor of itself.
It has been suggested that the mind is "hard wired" to begin operating at birth, beginning a lifetime
process of acquisition through abstraction, induction, and conception.
The acquisition of empirical knowledge, which begins the process of filling the tabula rasa, is thus
by means of the experience of sensation and perception.
Sensate data, or sensation, are distinct from perception. Perception is the recognition within the
knowing subject of the event of having had a sensation. The tabula rasa and must learn the nature of
sensation as the awareness of something which is outside itself. Commonly recognized sensory
systems are those for vision, hearing, somatic sensation (touch), taste and olfaction (smell).
Perception is the retention of a group of sensations transmitted through the sensory system(s),
which gives the knowing subject the ability to be aware, not only of the singularity of stimuli
presented by sensation itself, but of an entity, a thing, an existent.
Retention of percepts allows the human mind to abstract information from the percepts. The
abstraction is considered the extensional definition of the percept. An extension is "every object
that falls under the definition of the concept or term in question." This is the same as a universal
(metaphysics) or genus or denotation, or class (philosophy).
Once a universal (class) has been identified, then the next step in the acquisition of knowledge is
the abstraction of the intension, which is the particular, the species, or the connotation. Connotation
as its meaning as particular is "the assertion that at least one member of one class of things is either
included or excluded as a member of some other class." This means, for example, that a poodle is
the particular in a class or universal concept called "dog" or "canine".
162
AI Knowledge Acquisition Techniques for Instructional
Development
Knowledge engineering techniques for developing expert systems may also be useful for
instructional development. A review of knowledge engineering focusing on knowledge
representation and knowledge acquisition suggests ways in which these methods could be adapted
to developing instructional systems. As further work is done on intelligent computer-assisted
instructional systems and other complex instructional development projects, knowledge
engineering skills may become more important for the instructional developer.
Instructional developers usually consult with subject-matter experts to determine what knowledge
is critical for learners to acquire, and the form in which this knowledge might be communicated.
Knowledge engineers working in the relatively new field of expert system development also obtain
knowledge from experts and encode this knowledge in computer programs which solve problems.
There is no definitive approach for extracting expert knowledge for these systems, just as there is
no definitive process for gathering subject expertise in instructional development. However,
knowledge engineering techniques may be applicable to instructional development.
6.11 Case Study:
Mycin:
No course on Expert systems is complete without a discussion of Mycin. MYCIN was an early
expert system developed in the early 1970s at Stanford University. It was written in LISP as the
doctoral disseration of Edward Shortliffe under the direction of Bruce Buchanan, Stanley N. Cohen
and others. It arose in the laboratory that had created the earlier Dendral expert system. This expert
system was designed to identify bacteria causing severe infections, such as bacteremia and
meningitis, and to recommend antibiotics, with the dosage adjusted for patient's body weight. The
name ‘MYSIN’ is derived from the antibiotics themselves, as many antibiotics have the suffix "-
mycin". The Mycin system was also used for the diagnosis of blood clotting diseases.
Method
MYCIN operated using a fairly simple inference engine, and a knowledge base of appoximately
600 rules. It would query the physician running the program via a long series of simple yes/no or
textual questions. At the end, it provided a list of possible culprit bacteria ranked from high to low
based on the probability of each diagnosis, its confidence in each diagnosis' probability, the
reasoning behind each diagnosis (that is, MYCIN would also list the questions and rules which led
it to rank a diagnosis a particular way), and its recommended course of drug treatment.
Despite MYCIN's success, it sparked debate about the use of its ad hoc, but principled, uncertainty
framework known as "certainty factors". The MYCIN's performance was minimally affected by
perturbations in the uncertainty metrics associated with individual rules. The developers have
suggested that the power in the system was related more to its knowledge representation and
reasoning scheme than to the details of its numerical uncertainty model. This problem can be
solved by using a rigorous probabilistic framework, such as Bayesian statistics.
Results
Research conducted at the Stanford Medical School found Mycin to have a correct diagnosis rate of
about 65%, which was better than most physicians or infectious disease experts who were not
163
specialists in diagnosing bacterial infections. However, it was worse than those physicians who
were themselves experts in the field who had average correct diagnosis rates of about 80% or more.
Practical use
Mycin was developed partly in order to explore how human experts make these rough (but
important) guesses based on partial information. However, the problem is also a potentially
important one in practical terms - there are lots of junior or non-specialised doctors who sometimes
have to make such a rough diagnosis, and if there is an expert tool available to help them then this
might allow more effective treatment to be given.
Mycin was never actually used in practice. This was not because of any weakness in its
performance. It was as much because of ethical and legal issues related to the use of computers in
medicine. If it gives the wrong diagnosis, or recommends the wrong therapy, who can be held
responsible? Issues with whether human experts would find it acceptable to use arose as well.
However, the greatest problem, and the reason that MYCIN was not used in routine practice, was
the state of technologies for system integration, especially at the time it was developed. MYCIN
was a stand-alone system that required a user to enter all relevant information about a patient by
typing manually in response to questions that MYCIN would pose. The program ran on a large
time-shared system, available over the early Internet (ARPANet), before personal computers were
developed. In the modern era, such a system would be integrated with medical record systems,
would extract answers to questions from patient databases, and would be much less dependent on
physician entry of information. In the 1970s, a session with MYCIN could easily consume 30
minutes or more which is an unrealistic time commitment for a busy clinician.
MYCIN's greatest influence was accordingly its demonstration of the power of its representation
and reasoning approach. Rule-based systems in many non-medical domains were developed in the
years that followed MYCIN's introduction of the approach. In the 1980s, expert system "shells"
were introduced (including one based on MYCIN, known as E-MYCIN (followed by KEE)) and
supported the development of expert systems in a wide variety of application areas.
Its job was to diagnose and recommend treatment for certain blood infections. The proper diagnosis
involves growing cultures of the infecting organism. Unfortunately this takes around 48 hours, and
if doctors waited until this was complete their patient might be dead. So, doctors have to come up
with quick guesses about likely problems from the available data, and use these guesses to provide
a ‘covering’ treatment where drugs are given which should deal with any possible problem.
Processings/ Methodology:
Anyway Mycin represented its knowledge as a set of IF-THEN rules with certainty factors. The
following is an English version of one of Mycin's rules:
IF the infection is primary bacteremia
AND the site of the culture is one of the sterile sites
AND the suspected portal of entry is the gastrointestinal tract
THEN there is suggestive evidence (0.7) that infection is bacteroid.
The 0.7 is roughly the certainty that the conclusion will be true given the evidence. If the evidence
is uncertain the certainties of the bits of evidence will be combined with the certainty of the rule to
give the certainty of the conclusion.
164
Mycin was written in LISP, and its rules are formally represented as LISP expressions. The action
part of the rule could just be a conclusion about the problem being solved, or it could be an
arbitrary LISP expression. This allowed great flexibility, but removed some of the modularity and
clarity of rule-based systems, so using the facility had to be used with care.
Mycin is a goal-directed system, using the basic backward chaining reasoning strategy. However,
Mycin used various heuristics to control the search for a solution (or proof of some hypothesis).
These were needed both to make the reasoning efficient and to prevent the user being asked too
many unnecessary questions.
One strategy follows the pattern of human patient-doctor interviews. In which the user is asked a
number of more or less preset questions that are always required and which allow the system to rule
out totally unlikely diagnoses. Once these questions have been asked the system can then focus on
particular, more specific possible blood disorders, and go into full backward chaining mode to try
and prove each one. This rules out a lot of unnecessary search.
The other strategies relate to the way in which rules are invoked. The first one is simple: given a
possible rule to use, Mycin first checks all the premises of the rule to see if any are known to be
false. If so there's not much point using the rule. Some strategies relate more to the certainty
factors. Mycin will first look at rules that have more certain conclusions, and will abandon a search
once the certainties involved get below 0.2.
A dialogue with Mycin is somewhat like the mini dialogue. There are three main stages to the
dialogue:
In the first stage, initial data about the case is gathered so the system can come up with a very broad
diagnosis.
In the second more directed questions are asked to test specific hypotheses. At the end of this
section a diagnosis is proposed.
In the third section questions are asked to determine an appropriate treatment, given the diagnosis
and facts about the patient. This obviously concludes with a treatment recommendation.
At any stage the user can ask why a question was asked or how a conclusion was reached, and
when treatment is recommended the user can ask for alternative treatments if the first is not viewed
as satisfactory.
Mycin, though pioneering much expert system research, also had a number of problems which were
remedied in later, more sophisticated architectures. One of these was that the rules often mixed
domain knowledge, problem solving knowledge and “screening conditions” (conditions to avoid
asking the user silly or awkward questions - e.g., checking patient is not child before asking about
alcoholism).
A later version called NEOMYCIN attempted to deal with these by having an explicit disease
taxonomy (represented as a frame system) to represent facts about different kinds of diseases. The
basic problem solving strategy was to go down the disease tree, from general classes of diseases to
very specific ones, gathering information to differentiate between two disease subclasses (i.e. if
disease1 has subtypes disease2 and disease3, and you know that the patient has the disease1, and
subtype disease2 has symptom1 but not disease3, then ask about symptom1.)
165
There were many other developments from the MYCIN project. For example, EMYCIN was really
the first expert shell developed from Mycin. A new expert system called PUFF was developed
using EMYCIN in the new domain of heart disorders. And system called NEOMYCIN was
developed for training doctors, which would take them through various example cases, checking
their conclusions and explaining where they went wrong.
166
Check Your Progress:
1. What do you understand with the term “Expert Systems”? Write a note on need and
justification of expert systems.
2. Write short notes on:
i) Knowledge engineering
ii) Knowledge acquisition
iii) Inference engine
iv) Advantages and disadvantages of expert systems
3. What is chaining? Define forward and backward chaining.
4. Discuss Mycin.

Das könnte Ihnen auch gefallen