Sie sind auf Seite 1von 72

KENYATTA UNIVERSITY

INSTITUTE OF OPEN DISTANCE & e-LEARNING


IN COLLABORATION WITH

SCHOOL OF ENGINEERING AND TECHNOLOGY


DEPARTMENT OF COMPUTING AND INFORMATION
TECHNOLOGY

SIT 410 : KNOWLEDGE BASED SYSTEMS

WRITTEN BY:

EDITED BY:
COURSE AUTHORS NAME

Copyright Kenyatta University, 2010


All Rights Reserved
Published By:
KENYATTA UNIVERSITY PRESS
1

SIT 410 KNOWLEDGE BASED SYSTEMS


Introduction to artificial intelligence systems and their applications. Strategies for space search
such as uninformed, and heuristics. knowledge representation: Natural Language, Propositional
and predicate logic, semantic networks, frames, rules. KB based systems, Architectures and
applications of expert systems. Inference strategies: Goal driven and data driven. PROLOG.
Course Schedule
Lecture
1
2&3
4 ,5 & 6

TOPIC
Overview of Artificial Intelligence
Introduction-definitions and history, Branches of AI, Applications of AI
State space search: Strategies- Uninformed and heuristic search strategies
o Depth-first, breadth-first, uniform-cost search, A* Search algorithm, greedy search, e.t.
KBS development and implementation

Knowledge acquisition

Knowledge representation schemes/techniques


o

7&8

Prepositional and Predicate logic, Rules, frames, semantic networks, e.t.c.

Examples of shells for ES implementation


Knowledge-based systems

Definitions- knowledge, knowledge representation, inference

Components of a KBS

Types of KBS-Expert systems, rule based systems, KB DSS, e.t.c


Reasoning: Inference strategies: Forward and backward chaining systems

10

PROLOG

References
1. Decision Support Systems and Intelligent Systems (7th Edition): Books by Efraim Turban,Jay E.
Aronson,Ting-Peng Liang
2. Engineering Knowledge-Based Systems: Theory and Practice, by Avelino Gonzalez and Douglas
Dankel II.
3. Artificial Intelligence: A Modern Approach (2nd ed.), Russel, S., & Norvig, P. (2003).. New Jersey:
Prentice Hall.

Lecture One
Introduction to Artificial Intelligence (A.I)
Lecture Overview

Intelligence

Defining A.I

A.I Applications

Intelligence
Dictionary definition.
(1) The ability to learn or understand or to deal with new or trying situations : REASON; also :
the skilled use of reason
(2) The ability to apply knowledge to manipulate one's environment or to think abstractly as
measured by objective criteria (as tests)
Defining A.I
There is no agreed definition of the term artificial intelligence. However, there are various
definitions that have been proposed. These are considered below.

AI is a study in which computer systems are made that think like human beings.
Haugeland, 1985 & Bellman, 1978.

AI is a study in which computer systems are made that act like people. AI is the art of
creating computers that perform functions that require intelligence when performed by
people. Kurzweil, 1990.

AI is the study of how to make computers do things which at the moment people are
better at. Rich & Knight

AI is a study in which computers that rationally think are made. Charniac & McDermott,
1985.
4

AI is the study of computations that make it possible to perceive, reason and act. Winston,
1992

AI is the study in which systems that rationally act are made. AI is considered to be a
study that seeks to explain and emulate intelligent behaviour in terms of computational
processes. Schalkeoff, 1990.

AI is considered to be a branch of computer science that is concerned with the automation


of intelligent behavior. Luger & Stubblefield, 1993.

Views of AI fall into four categories:

Thinking humanly

Thinking rationally

Acting humanly

Acting rationally

The textbook advocates "acting rationally"

Therefore A.I is the part of computer science concerned with designing intelligent computer
systems, that is, computer systems that exhibit the characteristics we associate with intelligence
in human behaviour - understanding language, learning, reasoning and solving problems

A.I Applications

Autonomous control : e.g. ALVINN

Knowledge-based systems (KBS) e.g.


o Medical Diagnosis - MYCIN
o 1971, A program that could diagnose blood infections. It had 450 rules
o Mineral Prospecting - PROSPECTOR
o 1979, A program that with geological data. It recommended exploratory drilling
sites that proved to have substantial molybdenum deposits.

Game Playing e.g. Deep Blue

Data Mining

Problem Solving: complex problems e.g puzzle, mathematical problems, logistic planning

Robotics: intelligent systems which can control robots e.g. surgeon systems

A.I agents

Branches of A.I

Machine vision

Speech synthesis and recognition

Machine Learning

Robotics

Natural Language and understanding

Problem solving

Game playing

Knowledge-based systems

A.I agents

..

Intelligent Techniques
Intelligence techniques may be used for:

Capturing individual and collective knowledge and extending a knowledge base, using
artificial intelligence and database technologies

Capturing tacit knowledge, using expert systems, case-based reasoning, and fuzzy logic

Knowledge discovery, or discovering underlying, hidden patterns in data sets, using


neural networks and data mining

Generating solutions to highly complex problems, using genetic algorithms

Automating routine tasks, using intelligent agents

Artificial intelligence (AI) is the effort to develop computer-based systems (both hardware and
software) that behave as humans, with the ability to learn languages, accomplish physical tasks,
use a perceptual apparatus, and emulate human expertise and decision making.
Require input from both human experts for defining the knowledge base and knowledge
engineers, who translate the knowledge into a set of rules

Fuzzy logic systems:

Use rule-based logic to represent imprecise or ambiguous values used in human or


linguistic categorization, such as defining and comparing terms such as "hot, warm, cool,
cold" for use in a temperature control system.

Provide solutions to problems requiring expertise that is difficult to represent in the form
of crisp IF-THEN rules

Neural networks:

Find patterns and relationships in massive amounts of data that would be too complicated
and difficult for a human being to analyze.

"Learn" patterns by sifting through data, searching for relationships, building models, and
correcting over and over again the model's own mistakes.

Use a large number of sensing and processing nodes that continuously interact with each
other

May be sensitive and not perform well with too little or too much data

Are used in science, medicine, and business primarily to discriminate patterns in massive
amounts of data.

Genetic algorithms:

Find optimal solutions to a problem by examining a very large number of possible


solutions for that problem.

Use adaptive, evolutionary conceptual models to change and reorganize components of


possible solutions to create viable solutions, test their fitness, and discard unlikely
solutions

Use processes such as fitness, crossover, and mutation are to "breed" solutions.
8

Are useful for dynamic and complex business problems involving hundreds or thousands
of variables, such as problems involving engineering design optimization, product design,
and monitoring industrial systems.

Hybrid AI systems:

Integrate genetic algorithms, fuzzy logic, neural networks, and expert systems are being
developed to take advantage of the best features of each technology.

Intelligent agents:

Are software programs that work in the background without direct human intervention

Carry out specific, repetitive, and predictable tasks

Use a limited built-in or learned knowledge base to accomplish tasks or make decisions
on the user's behalf

Are used in agent-based modeling applications used to model or simulate the behavior
of consumers, stock markets, and supply chains and to predict the spread of epidemics

INTELLIGENT AGENTS IN P&GS SUPPLY CHAIN NETWORK


Intelligent agents are helping Procter & Gamble shorten the replenishment cycles for products
such as a box of Tide.

Lecture 2
State space search

1.

Introduction
All AI tasks involve searching.
General idea:

you know the available actions that you could perform to solve your problem

you don't know which ones in particular should be used and in what sequence, in order
to obtain a solution

You can search through all the possibilities to find one particular sequence of actions
that will give a solution.

The scenario:
o

Initial state

Target (Goal) state

A set of intermediate states

A set of operations, that move us from one state to another.

The task: Find a sequence of operations that will move us from the initial state to the target
state. Solved in terms of searching a graph. The set of all states: search space

10

Examples:
Expert systems: Find the sequence of rules that will prove the goal (backward
chaining)
Puzzles: Find a sequence of actions to solve the puzzle
Chess: Find the sequence of moves that will result in winning the game.
Search techniques:
Uninformed search: Exhaustive search (brute force methods: systematically and
exhaustively search all possible paths)

Depth-first

Breadth-first

Informed search: Heuristic search (use rules-of-thumb to guess which paths are likely to
lead to a solution)

Hill climbing

Best-first search

A* algorithm

Evaluating a search techniques/algorithm


1. Completeness
11

o Is the strategy guaranteed to find a solution?


2. Time Complexity
o How long does it take to find a solution?
3. Space Complexity
o How much memory does it take to perform the search?
4. Optimality
o Does the strategy find the optimal solution where there are several solutions?

2. Graphs and trees


A graph consists of a set of nodes (vertices) with links (edges) between them. A link is
represented usually as a pair of nodes, connected by the link.
o

Undirected graphs: the links do not have orientation

Directed graphs: the links have orientation


Examples:

12

Path: Sequence of nodes such that each two neighbors represent an edge
Examples: in G1: A B D C A E, in G2: EABDC
Note: the sequence ABEA in G2 is not a path, because the edges have
orientation, and there is no edge BE, the edge is EB
Cycle: a path with the first node equal to the last and no other nodes are repeated
Examples: In G1: A B D C A, In G2: no cycles.
Acyclic graph: a graph without cycles
Tree: undirected acyclic graph, where one node is chosen to be the root
Given a graph and a node:
Out-going edges: all edges that start in that node
In-coming edges : all edges that end up in that node
Successors (Children): the end nodes of all out-going edges
Ancestors (Parents): the nodes that are start points of in-coming edges
In undirected graphs the edges are symmetrical, i.e. the notion of child and parent
depends on how the graph is traversed.

13

3.

Exhaustive search
3. 1. Breadth-first search
At step i traverse all nodes at level i.
A. in trees

The order in Breadth-first search:


A
B, C
D, E, F, G, H,
I, J
B. in graphs: we need to keep a list of visited nodes as there may be cycles
Algorithm: using a queue
1. Queue = [initial_node] , FOUND = False
2. While queue not empty and FOUND = False do:
Remove the first node N
If N = target node then FOUND = true

14

Else find all successor nodes of N and put them into the queue.
In essence this is Dijkstra's algorithm of finding the shortest path between two nodes
in a graph.
3. 2. Depth-first search
Keep going down one path until you get to a dead end. Then back up and try alternatives.
Algorithm: using a stack
1. Stack = [initial_node] , FOUND = False
2. While stack not empty and FOUND = False do:
Remove the top node N
If N = target node then FOUND = true
Else find all successor nodes of N and put them onto the stack.
Search order for the sample tree:
A, B, D (leftmost path)
E

(back up thru B - explore its middle subtree)

(back up thru B - explore its right subtree)

C, G, I (back up thru A - explore its right subtree, and follow the leftmost path)
J

(back up thru G - explore its right subtree)

(back up thru C - explore its right subtree)

Depth-first search in graphs: keep a list of visited nodes


3. 3. Comparison of depth-first and breadth-first

In breadth-first going level by level the goal is eventually found without


backtracking

15

In depth-first we may reach a dead end, and in that case a backtracking is


accomplished - returning to a higher node and exploring its successors.

Length of path: breadth-first finds the shortest path first.


Memory: depth-first uses less memory
Time: If the solution is on a short path - breadth first is better, if the path is long - depth first
is better.

16

Lecture 3
Heuristic search

4.

Heuristic search
Heuristic search is used to reduce the search space.
Basic idea: explore only promising states/paths.
We need an evaluation function to estimate each state/path.
4. 1. Hill climbing
Basic idea: always head towards a state which is better than the current one.
Example: if you are at town A and you can get to town B and town C (and your target is
town D) then you should make a move IF town B or C appear nearer to town D than town A
does.
Algorithm:
o

Start with current-state = initial-state.

Until current-state = goal-state OR there is no change in current-state do:

Get the successors of the current state and use the evaluation function to
assign a score to each successor.

If one of the successors has a better score than the current-state then set the
new current-state to be the successor with the best score.

There is no exhaustive search, so no node list is maintained.


No problem with loops, as we move always to a better node.
Hill climbing terminates when there are no successors of the current state which are better
than the current state itself. If a solution is found, it is found for a very short time and with
17

minimum memory requirements. However it is not guaranteed that a solution will be found
- the local maxima problem.
General hill climbing is only good for a limited class of problems where we have an
evaluation function that fairly accurately predicts the actual distance to a solution.
4. 2. Best-first search

The evaluation function scores each successor node.


The node with the best score is chosen to be expanded.

The algorithm works in breadth-first manner, keeps a data structure (called agenda,
based on priority queues) of all successors and their scores.
Algorithm:

b. Start with agenda = [initial-state].


c. While agenda is not empty do

Pick the best node on agenda.

If it is the goal node then return with success. Otherwise find its
successors.

Assign the successor nodes a score using the evaluation function and add
the scored nodes to the agenda

If a node that has been chosen does not lead to a solution, the next "best" node is chosen, so
eventually the solution is found.
The algorithm always finds a solution, not guaranteed to be the optimal one.
Comparison with hill-climbing
Similarities: best-first always chooses the best node
Difference: best-first search keeps an agenda as in breadth-first search, and in case of
a dead end it will backtrack, choosing the next-best node.
18

Note: if the evaluation function is very expensive (i.e., it takes a long time to work out a
score) the benefits of cutting down on the amount of search may be outweighed by the costs
of assigning a score.

19

4. 3. The A* Algorithm
Best-first search doesn't take into account the cost of the path so far when choosing which
node to search from next. A* attempts to find a solution which minimizes the total length or
cost of the solution path.
A* algorithm uses an evaluation function that accounts for the cost from the initial state to
the current state, and the cost from the current state to the goal state (i.e. the score assigned to
the node in consideration).
F(Node) = g(Node) + h(Node)
g(Node) - the costs from the initial state to the current node
h(Node) - future costs, i.e. node score
A* always finds the best solution, provided that h(Node) does not overestimate the future
costs.
Thus, in the next example,
Hill climbing will choose node H and will be stuck in node K.
Best-first will choose node H, go to K, backtrack to F and will find a path to G: A H F
G, though not the optimal one.
A* will choose node D as its total score is 12: the sum of g(D) = 2 plus h(D) = 10.
Example:

20

Start state: A
Goal state: G
d. For each node write down its successors
e. Draw the search tree, that corresponds to the graph
f. Write the sequence of visited nodes and the cost of the path in:

depth-first search

breadth-first search

hill climbing

best-first search

A*

Revision Questions

Explain how breadth-first and depth first algorithms work, discuss the advantages and
disadvantages of each of them.
21

Explain how hill-climbing and best-first algorithms work.

Explain how A* algorithm works, compare it with best-first algorithm.

Given a graph as in the example above, be able to perform the tasks listed in the example.

22

Lecture 4
Knowledge Acquisition

Introduction
First, what precisely do we mean by knowledge?
We know about data processing and about information processing. The difference is a question of
levels.
First, data means just uninterrupted values, e.g. 46.
Second, information means organized values, which can be regarded as having some sense or
interpretation, e.g. 46 held as the age field in a personal_details record. (data + sense).
Third knowledge means information which is known to be true . (data + sense + knowing).
Types of Knowledge
It is recognized that there are different kinds of knowledge:
Declarative knowledge: facts.
Procedural knowledge: how to do things.
Semantic knowledge: use and meanings of words.
Conceptual knowledge: abstract knowledge of concepts and relationships between concepts.
Episodic knowledge: detailed knowledge of particular occurrences or experiences.
Meta-knowledge: knowledge about knowledge, e.g. how experts actually organise and use
their knowledge.
(Of course, not all of these need be involved in every knowledge-based system.)
The main difficulties about knowledge are:
Its overall unstructured nature
Its breadth and complexity
Even where knowledge is clearly structured (as in an encyclopaedia, for example), there is may
still be a practical difficulty over identifying and finding relevant knowledge.
Levels of Knowledge
23

AI workers (and psychologists) have recognised that there are different levels of knowledge:
shallow knowledge
deep knowledge
Shallow knowledge means surface-level information about appearances and behaviour in very
specific situations. For example If the petrol tank is empty then the car will not start.
Typically such knowledge could be in the form of rules of the form IF THEN .
We might have a lot of shallow knowledge (say about cars) and still have little understanding.
Dealing with complex or unfamiliar situations, or giving explanations, may not be easy just on
the basis of shallow knowledge.
Deep knowledge means knowledge of the internal and causal structure of a context or situation.
For example, knowledge of how a car engine works and of what happens inside it.
Such knowledge is much harder to represent in a computer. It may involve concepts,
relationships, abstractions and analogies.
Sources of Knowledge
Sources of knowledge are extremely varied. Books, databases, people. ES can be built with
appropriate means to search in databases. It s people that are the problem!
The knowledge acquisition problem is to elicit and formalise human knowledge and expertise.
Human knowledge is not well-structured. Worse, experts may use their knowledge
unconsciously. Also different experts not only may disagree, but also may have wholly different
approaches and methods by which they apply their knowledge.
The knowledge acquisition problem is to elicit and formalise human knowledge.
There is a range of methods: manual, semi-automatic, automatic.
Knowledge Acquisition
Manual Knowledge Acquisition
Two kinds of approach:
1. via interviews with experts
2. via observation of experts in action
The knowledge engineer elicits the knowledge from the expert and fits it into some chosen
knowledge representation scheme (which the expert will generally not know about).
24

1. Interviewing is a skill, and much effort has been put into developing interviewing techniques
(not just for knowledge acquisition).
Interviews may be structured: the interviewer may work to a standardized scheme of questioning.
This may be appropriate if the knowledge representation scheme has been previously worked
out.
Or the interviewing may consist of having the expert talk through his approach to certain
particular problems, with prompting from the interviewer.
2. Observation just means noting circumstances which arise and actions taken. The observer
may intrude by asking the expert to give his reasons for particular steps, or to think aloud while
he is working. Tracking is the jargon word for following the expert's train of thought.
The difficulty is that the expert is not generally a knowledge engineer and the knowledge
engineer is not generally an expert. There is a gap to be bridged, since neither will know what is
of significance to the other.
The solution is likely to be to allow the expert to become a knowledge engineer, possibly by
giving him/her computer support.
Knowledge engineering is itself an expert task. It is clearly possible to envisage an expert system
which may assist with it.
Meanwhile, let us consider ways in which machine assistance may be brought into the
knowledge engineering process.
Semi-Automatic Knowledge Acquisition
Our rule-based shell may be regarded as providing computer support for knowledge acquisition
via its Build facility. It does not really elicit knowledge, though.
We look at one technique which can be automated for eliciting knowledge: Repertory Grid
Analysis.
Repertory Grid Analysis
Example: Consider the problem of selecting an appropriate programming language for a
particular programming task. The first stage of RGA involves the following steps:
1. The expert identifies important objects (e.g. Java, LISP, Cobol, Prolog, Perl, Fortran, C).
2. The expert identifies important attributes of these (e.g. availability, ease of use, training time,
orientation).
25

3. The expert identifies for each attribute a criterion or measure (e.g. for availability
High/Medium/Low, for orientation Symbolic/General/Numeric).
Once these have been established, the expert is prompted by the following indirect means to
impart his expertise:
1. The interviewer (or automatic system) repeatedly asks questions about which attributes
distinguish some objects from others, perhaps by giving three objects, and asking for an attribute
which can distinguish two of them from the third (e.g. for LISP, Prolog and Cobol, two are
Symbolic and one is not).
2. The interviewer builds up a table (grid) containing numerical ratings for the attributes for each
object.
3. The expert may then examine the results, and adjust the table if it appears not to be a correct
representation of the knowledge.
This is a simplified description of the process. Computer systems exist which use this approach
to elicit knowledge in a quite sophisticated way (see Turban).
Automatic Knowledge Acquisition
Broadly, this means using a computer program to convert data into knowledge. This process may
also be described as learning.
We may imagine other situations like the choice of programming language, where a given
situation has certain characteristics which will determine a correct decision or action.
The idea is to create general rules from a set of example cases where the correct outcome is
known. These cases may be
Real existing data, or
The record of the program's own experience, or
generated by an expert to represent his/her knowledge.
There are automated systems which do this. A well-known one is ID3 (Turban p146). Given a set
of cases, it orders the various attributes as to relevance to the outcome, and then builds a decision
tree. This tree may then be used to reach a conclusion when we are given a new case, with new
attribute values.

26

Lecture 5 and 6
Knowledge Representation
Introduction
Knowledge. True rational belief(philosophy).OR facts, data and relationships
(Computational view).
Representation. Structure + operations; OR map + operations; OR game layout and rules
of play; OR abstract data types.
Knowledge representation. Framework for storing knowledge and manipulating
knowledge OR Set of syntactic and semantic conventions that makes it possible to
describe things. Bench-Capon, 1990.
The object of KR is to express knowledge in a computer-tractable form, so that it can be
used to help agents perform well.
A KR language is defined by two aspects:
Syntax: describes how to make sentences OR describes the possible
configurations that can constitute sentences.
Semantics: determine the facts in the world to which the sentences refer OR the
things in the sentence.
Inference:
The terms inference and reasoning are generally used to cover any process by which
conclusions are reached.
Logical inference deduction

Different Knowledge Representation schemes/formalisms


Natural Language
27

Frames
Semantic Nets
Rules
Logic
Propositional logic (Boolean Logic)
Predicate logic (First Order Logic)

1. Natural Language
Expressiveness of natural language:
Very expressive, probably everything that can be expressed symbolically can be
expressed in natural language (pictures, content of art, emotions are often hard to express)
Probably the most expressive knowledge representation formalism we have. Reasoning is
very complex, hard to model
Problems with natural language:
Natural language is often ambiguous.
Syntax and semantics are not fully understood.
There is little uniformity in the structure of sentences.
2. Semantic Networks
Originally developed in the early 1960s to represent the meaning of English words. The term
dates back to Ross Quillian's Ph.D. thesis (1968), in which he first introduced it as a way of
talking about the organization of human semantic memory, or memory for word concepts.
A semantic net is a graph, where the nodes in the graph represent concepts, and the arcs
represent binary relationships between concepts.
28

Types of relations:
subclass, the link is named is_a
member, the link is named is_instance_of
Other relations used depend on the application. (e.g. has_parts, likes, etc)
Property inheritance is the basic inference mechanism for semantic networks.
Example

This network represents the fact that mammals and reptiles are animals that mammals have
heads, an elephant is a mammal, and Clyde is a particular elephant
Inferring facts not explicitly represented: Clyde has a head.
Representational adequacy - problems with representing quantifiers, (such as ``every dog in
town has bitten the constable'')
Advantages. Easy to translate to predicate calculus.
29

Disadvantages. Cannot handle quantifiers; nodes may have confusing roles or meanings;
searching may lead to combinatorial explosion; cannot express standard logical connectives;
can represent only binary or unary predicates.

Summary:

Simple way to represent binary relations.

Use inheritance via the is_a and is_instance_of relations to infer implicit facts

Difficult to use semantic networks in a fully consistent and meaningful manner.

3. Frames
Frames capture knowledge about typical objects or events, such as a typical bird, or a typical
restaurant meal. All the information relevant to a particular concept is stored in
a single complex entity, called a frame.
Frames support inheritance.

Example 1
Mammal
subclass:

Animal

warm_blooded: yes
Elephant
subclass:

Mammal

* colour:

grey

* size:

large

Clyde
instance:

Elephant

colour:

pink
30

owner:

Fred

Nellie:
instance:
size:

Elephant
small

Components of a frame entity: attribute - value pairs.


attributes (also called slots) are filled with particular values
E.G. attribute color, value grey.
Types of attributes:
Definitive attributes: they define the object, distinguish between the particular object and
other objects in the same class. For example having wings might be a definitive feature for
birds (is it?)
Necessary attributes: necessary for every object in the class, e.g. a necessary feature of a bird
is laying eggs, however it is not definitive - raptiles also lay eggs
Attribute values:
Typical attribute values: for example a typical feature for birds is that they fly. However there
are some birds that do not fly. In the above example ``*'' is used to indicate attributes that are
only true of a typical member of the class, and not necessarily every member.
Default attribute values. Default values help us fill in the blanks of information about a given
object. For example, we assume by default that birds fly, unless stated otherwise.
Overriding values: Some typical features for a given class of objects are not present for
certain members of that class, for example there are birds that do not fly. In such cases we talk
about overriding values
Property inheritance
Simple if single parent-class, single values for slots.
There may be problems in case of multiple values and several parent classes:
Multiple values:
Elephant: has part: trunk
Mammal: has part: head.
31

Clyde is an elephant. What would be the value of the slot has part?
Several parent classes (e.g., Clyde is both an elephant and a circus-animal)
Which parent to inherit from first?.
Slots and Procedures
Frame representation can use a procedure to compute the value of a given slot if needed, e.g.
the area of a square, given the size

32

Example 2

Advantages: can cope with missing values- close matches are presented.
Disadvantages: has been hard to implement, especially inheritance. Representational adequacy
certain things are difficult to represent: Negation, disjunction, quantification

4. Rules
These are formalization often used to specify recommendations, give directives or strategy.
Format:

IF <premises> THEN <conclusion>.

Related ideas: rules and fact base; conflict set - source of rules; conflict resolution- deciding on
rules to apply.
One of the most popular approaches to knowledge representation is to use production rules,
sometimes called IF-THEN rules. They can take various forms.e.g.
IF condition THEN action
IF premise THEN conclusion
IF proposition p1 and proposition p2 are true
33

THEN proposition p3 is true


Some of the benefits of IF-THEN rules are that they are modular, each defining a relatively small
and, at least in principle, independent piece of knowledge. New rules may be added and old ones
deleted usually independently of other rules.
Advantages: easy to use; explanations are possible; capture heuristics; can handle uncertainties to
some extent.
Disadvantages: cannot cope with complex associated knowledge; they can grow to
unmanageable size.

Production Rules

They are conditional statements specifying an action to be taken in case a certain


condition is true.

They codify knowledge in the form of premise-action pairs.

Syntax: IF (premise) THEN (action)

Example: IF income is `standard' and payment history is `good', THEN `approve home
loan'.

In case of knowledge-based systems, rules are based on heuristics or experimental


reasoning.

Rules can incorporate certain levels of uncertainty.

A certainty factor is synonymous with a confidence level , which is a subjective


quantification of an expert's judgment.

The premise is a Boolean expression that should evaluate to be true for the rule to be
applied.

The action part of the rule is separated from the premise by the keyword THEN.

The action clause consists of a statement or a series of statements separated by AND's or


comma's and is executed if the premise is true.
34

A rule based system will contain global rules and facts about the knowledge domain covered.
During a particular run of the system a database of local knowledge may also be established,
relating to the particular case in hand. One of the most widely used tutorial examples of rule
based systems is Mycin, an expert system which was designed to assist doctors with the
diagnosis and treatment of bacterial infection. It uses the rule based approach and also
demonstrates the way in which uncertainty (both in observations and in the reasoning process)
may be handled.
Mycin was designed to help the doctor to decide whether a patient has a bacterial infection,
which organism is responsible, which drug may be appropriate for this infection, and which may
be used on the specific patient.
The global knowledge base contains facts and rules relating for example symptoms to infections,
and the local database will contain particular observations about the patient being examined. A
typical rule in Mycin is as follows:
IF the identity of the germ is not known with certainty
AND the germ is gram-positive
AND the morphology of the organism is "rod"
AND the germ is aerobic
THEN there is a strong probability (0.8) that the germ is of type enterobacteriacae
Note that a probability or certainty factor (C.F.) is given, reflecting the strength of the original
expert's confidence in the inference made in this rule. In other words, the confidence in the
conclusion assuming the premises are true. The premises are, in fact, established from
observations either in the laboratory or from the patient, and may themselves have an element of
uncertainty associated with them. In the above example it may only be known that the germ is
aerobic with a probability of 0.5.
The certainty factor associated with a conclusion in MYCIN is calculated from the certainty
factor of the premises, the certainty factor of the rule and any existing certainty factors for the
conclusion if it has been obtained already from some other rules.
The way in which the knowledge base is used is determined by the inference engine. It is a basic
principle of production systems that each rule should be an independent item of knowledge and

35

essentially ignorant of other rules. The inference engine could then simply "fire" rules at any
time when its premises are satisfied.
If several rules could all fire at once the inference engine must have a mechanism for "conflict
resolution". This may be achieved, for example, by having some predefined order, perhaps on the
basis of the strength of the conclusion, or alternatively on the basis of frequency of rule usage.
Forward and Backward chaining through the rules may be used. The two systems each have their
advantages and disadvantages and in fact answer different types of question. For example, in
Mycin a forward chaining system might answer the question "what do these symptoms suggest?"
whereas a backward chaining system might answer the question "does this patient suffer from a
pelvic abscess?" In general, rules and goals may need to be constructed differently for forward
and backward chaining systems.
5. Propositions
A proposition is a statement that is either true or false.
For example, here are some propositions:
The file is being printed.
The system is ready.
The red light is on.
It is conventional to represent propositions by lower case letters.
For example:
p: The file is being printed.
q: The system is ready.
r: The red light is on.

If we use the verbal specification it is not easy to answer the question.

However, suppose that we replace the various statements by letters:


36

t : the alarm is activated by the temperature monitoring interface


f : the alarm is activated by the flow monitor
m : the alarm is activated manually
b: the bell sounds in the chief supervisor's office
w: a warning message appears on all supervisor's screens

Then, using various symbols that we will define shortly, the specification may be
rewritten as:
(tfm)(bw)

It is much easier to see the "structure" of this symbolic statement than the verbal one.

Alarm does not work accordingly to specification.

Propositions, Connectives, Compound, propositions


A compound proposition is built from simple propositions using connectives (sometimes also
called "operators) including:
Not
And
Or
If then
Xor
Compound Propositions
Examples of compound propositions:
If the system is ready and the red light is on then the file is printed.
(q r ) p
If the file is not printed then either the red light is not on or the system is not ready
~p (~r ~q)

37

Either the red light is on and the file is printed or else the system is not ready.
(r p) ~q
Example 1

Define the following propositions:


p: Peter is driving his own car.
a: Andrew is late.
m: Max has caught the bus.

Write each of the following in symbols:


Either Peter is driving his own car and Andrew is late or else Max has not caught the bus.
solution:
(p a) ~m
Example 2

Define the following propositions:


p: Peter is driving his own car.
a: Andrew is late.
m: Max has caught the bus.

Translate into simple English:


m ( p a)
Solution:
Max has caught the bus and either Peter is not driving his own car or Andrew is not late.
Truth Table
Often we want to discuss properties/relations common to all propositions. In such a case rather
than stating them for each individual proposition we use variables representing an arbitrary
proposition and state properties/relations in terms of those variables. Those variables are called a
propositional variable. Propositional variables are also considered a proposition and called
a proposition since they represent a proposition hence they behave the same way as
38

propositions. A proposition in general contains a number of variables. For example (P

Q)

contains variables P and Q each of which represents an arbitrary proposition. Thus a proposition
takes different values depending on the values of the constituent variables. This relationship of
the value of a proposition and those of its constituent variables can be represented by a table. It
tabulates the value of a proposition for all possible values of its variables and it is called a truth
table.

For example the following table shows the relationship between the values of P, Q and P Q:
OR
P

(P Q)

In

the

table,

represents

truth

value

false

and

true.

This table shows that P Q is false if P and Q are both false, and it is true in all the other cases.

Meaning of the Connectives


NOT, AND, OR, IMPLIES, IF AND ONLY IF
Let us define the meaning of the five connectives by showing the relationship between the
truth value (i.e. true or false) of composite propositions and those of their component
propositions. They are going to be shown using truth table. In the tables P and Q represent
arbitrary propositions, and true and false are represented by T and F, respectively.

39

.
NOT

T
F

(P Q)
(P Q)
F
F
T
F
T
F
T
T

AND

IMPLIES

(P

This F
T
table
T

Q)

shows that (P Q) is true if both P and Q are true, and that it is false in any other case.

This table shows that if P is true, then (

P) is false, and that if P is false, then (

P)is

true.

Similarly for the rest of the tables.

When P

Q is always true, we express that by P

Q. That is P

Q is used when proposition P

always implies proposition Q regardless of the value of the variables in them.


IF AND ONLY IF
(P

When P

Q)

Q is always true, we express that by P

Q. That is

is used when two propositions

always take the same value regardless of the value of the variables in them. See Identities for
examples of

.
40

OR

Assignment:
Construct the truth table for p (q r)
Tautologies and contradictions

A proposition that is true for every combination of truth values is called a tautology.

For example, the proposition ( p q ) ( ~ p ~ q) is a tautology

A proposition that is false for every possible combination OF truth values is called a
contradiction.

For example, the proposition (p q) (p q) is a contradiction.

Logical equivalence

We often have different, but equivalent, logical expressions; that is expressions that look
different but having the same meaning.

It is important to be able to determine whether two given propositions merely sound


similar or whether they have exactly the same logical meaning.

One way to verify logical equivalence is to use a truth table.

We denote logical equivalence by the symbol .

Example : Show that p ( p q ) ( p q )

The Laws of Propositional Logic

We frequently need to simplify logical expressions or to check whether given


logical expressions are logically equivalent.

One way to do these things is to use the laws of logic.

For our purposes the most important laws are the distributive laws and de Morgan's
laws.
41

Distributive Laws
p (q r) = (p q) (p r)
p (q r) = (p q) (p r)
De Morgans Laws
(p q) = p q
(p q) = p q
Conditional statements
In computing we often use conditional statements of the form "If then "; in other
words if particular conditions are satisfied then certain consequences should follow.
There are many ways of expressing this type of statement in English.
A proposition of the form "If p then q" is called a conditional statement, and is represented
by p q.
The symbolic statement is usually read as "if p then q" or
perhaps as "p implies q".
Construct truth tables for the following proposition:
((p q) r) (p r)
The contrapositive of a conditional

Given a conditional statement such as p q , its contrapositive is q p

The contrapositive of a conditional is just another way of saying the same thing as
the conditional.

In other words a conditional statement and its contrapositive are logically


equivalent.

When one is true, then so is the other. If one is false, so is the other.

It is not difficult to think of some examples:

The conditional p q and its contrapositive q p are logically equivalent.

6. Predicates

A predicate is a statement with one or more variables. If we assign values to the


variables then the statement becomes a proposition, and has a truth value.
42

For example:
"7 > 20" is a proposition, whereas
"x > 20" is a predicate
"Peter owns 3 cats" is a proposition, whereas
"x owns y cats" is a predicate

We will denote predicates by capital letters.


Some set notation

A set is a collection of things, usually (but not necessarily) sharing some common
attribute.
o Eg. let P = set of all people in this room
o Eg. let A = set of all letters in the alphabet

In maths we have some special sets of numbers

Eg. R denotes the set of all real numbers (numbers with a decimal
point)

o eg . N denotes the set of all natural numbers = { 1, 2, 3, 4, }


o We use the symbol to denote membership of a set.
o The symbol is read "is a member of" or "is an element of" or "belongs to
QuantifierS

A predicate has one or more variables, and if we substitute values for the variables the
predicate becomes a proposition and has a truth value.

Instead of substituting particular values for the variables, we may be able to make a more
general statement by using a quantifier.

We will use two quantifiers, the

universal quantifier , and the

existential quantifier .

Example:

Consider the predicate Q(x) : ( x < 5 ) ( x 5 )

This is true for all real numbers. So we can say:

For all real numbers x, Q(x) We can use the symbol which means "for all" or "for
43

every", and write


x R, Q(x)

Example:

Let P = set of people in this room and define the predicate


C(x) : x likes chocolate

Then x P, C(x) means "Everybody in this room likes Chocolate And if we define the
predicate

D(x) : x likes going to the dentist then


x P, ~ D(x) means "Everybody here dislikes going to the dentist", or "Nobody in
this room likes going to the dentist.

The universal quantifier is used for statements of the type

"All do " or "None do "

"All are " or "None are "

The existential quantifier is used when we are making statements of the type "some do" or
"some don't". The symbol is read as "there exists " or "there is at least one " or "for
some "

Examples:

If we define I(x) : x speaks Italian, and if


P = set of people in this room, then :
x P, I(x) means "There is at least one person in this room who speaks Italian
x P, ~ I(x) means "There is at least one person in this room who does not speak
Italian"

If we define S(x): (x > 2) (x < 7), then


x N, S(x) means "There is at least one natural number that is bigger than 2 and less
than 7

The existential quantifier is used for statements of the type

"Some do " or "Some don't "

"Some are " or "Some are not "

44

Connections between and


e.g. Everybody likes ice cream means that there is no one who doesnt like ice cream.
x likes(x,icecream) x likes(x,icecream)

using De Morgans Theorem:


x S x S
x S x S

Predicates with two quantifiers

Now we consider propositions with two quantifiers, for

Example "Every student passed at least one unit" or

"There is at least one song that everybody has heard.

Example:

Let P be a set of people and M a set of movies, and define the predicate S(x, y) to
mean "person x has seen movie y".

Consider the following symbolic statements:


(i) x P, y M, S(x, y), which may be translated:
"For each person x, there is some movie y such that x has seen y. or in simpler
English:"Every person has seen at least one movie."
(ii) x P, y M, S(x, y), which may be translated:
"There is some person x such that for every movie y, x has seen y. or more simply:
"Some person has seen every movie.

(iii) y M, x P, S(x, y), which may be translated:


"There is some movie y such that for every person x, x has seen y." or in simpler English:
"There is a movie that every person has seen.

45

(iv) x P, y M, S(x, y), which may be translated:


"For every person x and for every movie y, x has seen y." or more simply:
"Every person has seen every movie."

Lecture 7
Knowledge-based Systems
Knowledge-based system is a computer system that is programmed to imitate human
problem-solving by means of artificial intelligence and reference to a database of
knowledge on a particular subject.
Knowledge-based systems are systems based on the methods and techniques of Artificial
Intelligence. Their core components are the knowledge base and the inference
mechanisms.
KBS is a frequently used abbreviation for knowledge-based system.
Remarks:
1. KBS is often used as a synonym for an expert system (ES) although the two are
not the same in a strict sense. Strictly speaking, a KBS is any system that uses
knowledge in performing its tasks.
2. KBS is a branch of artificial intelligence.
3. Keywords in the definition: "knowledge", "represents", "reasons", "specialist".
4. KBS uses the heuristic method in problem solving.

46

Characteristics of KBS
1. KBS differs from conventional programs.
It simulates human reasoning about a domain, rather than the domain itself.
It performs reasoning over representations of human knowledge, in addition to

doing

numerical computations or data retrieval.


It solves problems by heuristic or approximate methods which, unlike the algorithmic
solutions, are not guaranteed to succeed.
2. KBS differs from other AI systems.
It deals with subject matter of realistic complexity that normally requires a considerable
amount of human expertise.
It must exhibit high performance in terms of speed and reliability in order to be a useful
tool.
It must be capable of explaining and justifying solutions or recommendations to convince
the user that its reasoning is in fact correct.

Applications of KBS
Some areas where KBS has been very successful:
1. Medical diagnosis: MYCIN (for blood disorder)
2. Molecular structure analysis: DENTRAL
3. Computer configuration: XCON (R1)
4. Machine fault diagnosis
5. Fraud detection
47

6. Loan evaluation
7. ... ...
Too many to enumerate.
Major Components of a KBS
A KBS usually consists of four major components:
User interface
converts user queries into an internal representation to be processed by the system, and
converts system's solutions and explanations into a language which the user can
understand.
Knowledge Base
contains expert knowledge about a narrow domain of application.
Inference Engine
manipulates the knowledge base, i.e., deduce new knowledge from the knowledge base,
to give answers to user's queries.
Explanation generator
(Sometimes it is also considered as part of the Inference engine.) provides explanations to
the user about how the system arrives at a conclusion so that the user can be convinced.

48

Major Components of a KBS

Figure: Major Components of a KBS/ES.


49

When to Consider KBS


KBS provides a mechanism to share existing but scarce expertise:

When the expert is unavailable,

When the qualitative performance of nonexpert needs to be enhanced,

When the efficiency and consistency of the expert need to be enhanced, and

When others need to be trained to understand the expert's thought processes.

Economic Considerations
The following criteria should be met before embarking on a KBS project for solving a
highly constrained class of problems:

No known algorithmic solution exists, thereby forcing consideration of the use of


heuristic knowledge.

The expert's solution to the problem is satisfactory (but may suffer from
procedural difficulties such as timeliness).

Decisions made by nonexperts are likely to be different from those of the expert
and to have a significant impact on the organization in terms of

financial cost,

resource consumption,

delay (efficiency), and

risk.

Who Are Involved in Developing a KBS


1. Management
2. End-users
3. Project Champion
50

4. Domain Experts
5. Knowledge Engineers/Crafters
6. Apprentice Knowledge Engineers/Crafters
Knowledge Engineering and Knowledge Engineers

Knowledge engineers are those who study the problem domain, acquire knowledge
from the expert and represent the knowledge in a structured form in the knowledge
base.

Knowledge engineering is the a subfield of AI which is devoted to knowledge


acquisition, representation and inference for KBS.

There are different views towards how mature the KBS technology and design
process are. Some people regard the technology and the process are not mature
enough to be engineered, thus the term crafting and crafter.

We will not, however, make such distinction in this Unit.

51

Lecture 8
Rule-Based Systems and Shells
It was noted early on in the history of ES that certain parts of an ES could be re-used for other ES
which dealt with different domains.
So attention has been given to developing frameworks or shells which provide as much as
possible of an ES and into which the context-dependent parts can be fitted. (The idea of a
general problem solver was perhaps not so unrealistic after all.)
A shell may provide:
Knowledge acquisition subsystem.
Inference engine.
User interface.
Explanation subsystem.
Shells
A shell does not provide a knowledge base (though it will provide the structure for a knowledge
base).
In order to build an ES using shell, it is necessary only to construct and install a knowledge base
As we shall see, different expert systems may be designed on fundamentally different principles,
containing knowledge bases with completely different structures.
The most significant categories are rule-based, case- based and model-based systems.
We shall consider first rule-based systems ...
Knowledge as Rules
By an IF ... THEN ... rule we mean something like:
IF ID Checked
AND Satisfactory Employment
AND Salary Adequate
THEN Credit Granted
It will be convenient to think of (and write) such rules with the conclusion first:
Credit Granted
IF ID Checked
AND Satisfactory Employment
AND Salary Adequate
52

The part of a rule after the IF is called the body of the rule. It contains what will be subgoals.
As we shall see, there are several different kinds of things which can appear in the body of a rule.
Rules may contain AND, as above. They may also contain OR.
For example
ID Checked
IF Credit Card Shown
OR Driving Licence Shown
OR Passport Shown
As these two examples show, the rules will form a tree structure. Trying to demonstrate the
truth' of the conclusion to a rule will lead to requirements to demonstrate the truth of premises of
the rule, which in turn will lead to . . .
Each rule is referred to by its conclusion, so the two rules above are called Credit Granted and
ID Checked.
Backward Chaining Systems
So far we have looked at how rule-based systems can be used to draw new conclusions from
existing data, adding these conclusions to a working memory. This approach is most useful when
you know all the initial facts, but don't have much idea what the conclusion might be.
If you DO know what the conclusion might be, or have some specific hypothesis to test, forward
chaining systems may be inefficient. You COULD keep on forward chaining until no more rules
apply or you have added your hypothesis to the working memory. But in the process the system
is likely to do alot of irrelevant work, adding uninteresting conclusions to working memory. For
example, suppose we are interested in whether Alison is in a bad mood. We could repeatedly fire
rules, updating the working memory, checking each time whether (bad-mood alison) is in the
new working memory. But maybe we had a whole batch of rules for drawing conclusions about
what happens when I'm lecturing, or what happens in February - we really don't care about this,
so would rather only have to draw the conclusions that are relevant to the goal.
This can be done by backward chaining from the goal state (or on some hypothesised state that
we are interested in). This is essentially what Prolog does, so it should be fairly familiar to you
by now. Given a goal state to try and prove (e.g., (bad-mood alison)) the system will first check
to see if the goal matches the initial facts given. If it does, then that goal succeeds. If it doesn't
the system will look for rules whose conclusions (previously referred to as actions) match the
53

goal. One such rule will be chosen, and the system will then try to prove any facts in the
preconditions of the rule using the same procedure, setting these as new goals to prove. Note that
a backward chaining system does NOT need to update a working memory. Instead it needs to
keep track of what goals it needs to prove to prove its main hypothesis.
In principle we can use the same set of rules for both forward and backward chaining. However,
in practice we may choose to write the rules slightly differently if we are going to be using them
for backward chaining. In backward chaining we are concerned with matching the conclusion of
a rule against some goal that we are trying to prove. So the 'then' part of the rule is usually not
expressed as an action to take (e.g., add/delete), but as a state which will be true if the premises
are true.
So, suppose we have the following rules:
1. IF (lecturing X)
AND (marking-practicals X)
THEN (overworked X)
2. IF (month february)
THEN (lecturing alison)
3. IF (month february)
THEN (marking-practicals alison)
4. IF (overworked X)
THEN (bad-mood X)
5. IF (slept-badly X)
THEN (bad-mood X)
6. IF (month february)
THEN (weather cold)
7. IF (year 1993)
THEN (economy bad)
and initial facts:

54

(month february)
(year 1993)
and we're trying to prove:
(bad-mood alison)
First we check whether the goal state is in the initial facts. As it isn't there, we try matching it
against the conclusions of the rules. It matches rules 4 and 5. Let us assume that rule 4 is chosen
first - it will try to prove (overworked alison). Rule 1 can be used, and the system will try to
prove (lecturing alison) and (marking practicals alison). Trying to prove the first goal, it will
match rule 2 and try to prove (month february). This is in the set of initial facts. We still have to
prove (marking-practicals alison). Rule 3 can be used, and we have proved the original goal
(bad-mood alison).
One way of implementing this basic mechanism is to use a stack of goals still to satisfy. You
should repeatedly pop a goal of the stack, and try and prove it. If its in the set of initial facts then
its proved. If it matches a rule which has a set of preconditions then the goals in the precondition
are pushed onto the stack. Of course, this doesn't tell us what to do when there are several rules
which may be used to prove a goal. If we were using Prolog to implement this kind of algorithm
we might rely on its backtracking mechanism - it'll try one rule, and if that results in failure it
will go back and try the other. However, if we use a programming language without a built in
search procedure we need to decide explicitly what to do. One good approach is to use an
agenda, where each item on the agenda represents one alternative path in the search for a
solution. The system should try `expanding' each item on the agenda, systematically trying all
possibilities until it finds a solution (or fails to). The particular method used for selecting items
off the agenda determines the search strategy - in other words, determines how you decide on
which options to try, in what order, when solving your problem. We'll go into this in much more
detail in the section on search.

Forward Chaining Systems


In a forward chaining system the facts in the system are represented in a working memory which
is continually updated. Rules in the system represent possible actions to take when specified
conditions hold on items in the working memory - they are sometimes called condition-action
rules. The conditions are usually patterns that must match items in the working memory, while
the actions usually involve adding or deleting items from the working memory.
55

The interpreter controls the application of the rules, given the working memory, thus controlling
the system's activity. It is based on a cycle of activity sometimes known as a recognise-act cycle.
The system first checks to find all the rules whose conditions hold, given the current state of
working memory. It then selects one and performs the actions in the action part of the rule. (The
selection of a rule to fire is based on fixed strategies, known as conflict resolution strategies.) The
actions will result in a new working memory, and the cycle begins again. This cycle will be
repeated until either no rules fire, or some specified goal state is satisfied.
Rule-based systems vary greatly in their details and syntax, so the following examples are only
illustrative.
First we'll look at a very simple set of rules:
1. IF (lecturing X)
AND (marking-practicals X)
THEN ADD (overworked X)
2. IF (month february)
THEN ADD (lecturing alison)
3. IF (month february)
THEN ADD (marking-practicals alison)
4. IF (overworked X)
OR (slept-badly X)
THEN ADD (bad-mood X)
5. IF (bad-mood X)
THEN DELETE (happy X)
6. IF (lecturing X)
THEN DELETE (researching X)
Here we use capital letters to indicate variables. In other representations variables may be
indicated in different ways, such as by a ? or a ^ (e.g., ?person, ^person).
Let us assume that initially we have a working memory with the following elements:

56

(month february)
(happy alison)
(researching alison)
Our system will first go through all the rules checking which ones apply given the current
working memory. Rules 2 and 3 both apply, so the system has to choose between them, using its
conflict resolution strategies. Let us say that rule 2 is chosen. So, (lecturing alison) is added to
the working memory, which is now:
(lecturing alison)
(month february)
(happy alison)
(researching alison)
Now the cycle begins again. This time rule 3 and rule 6 have their preconditions satisfied. Lets
say rule 3 is chosen and fires, so (marking-practicals alison) is added to the working memory. On
the third cycle rule 1 fires, so, with X bound to alison, (overworked alison) is added to working
memory which is now:
(overworked alison)
(marking-practicals alison)
(lecturing alison)
(month february)
(happy alison)
(researching alison)
Now rules 4 and 6 can apply. Suppose rule 4 fires, and (bad-mood alison) is added to the working
memory. And in the next cycle rule 5 is chosen and fires, with (happy alison) removed from the
working memory. Finally, rule 6 will fire, and (researching alison) will be removed from working
memory, to leave:
(bad-mood alison)
(overworked alison)
(marking-practicals alison)
(lecturing alison)
(month february)
(This example is not meant to a reflect my attitude to lecturing!)
57

The order that rules fire may be crucial, especially when rules may result in items being deleted
from working memory. (Systems which allow items to be deleted are known as nonmonotonic).
Anyway, suppose we have the following further rule in the rule set:
IF (happy X)
THEN (gives-high-marks X)
If this rule fires BEFORE (happy alison) is removed from working memory then the system will
conclude that I'll give high marks. However, if rule 5 fires first then rule 7 will no longer apply.
Of course, if we fire rule 7 and then later remove its preconditions, then it would be nice if its
conclusions could then be automatically removed from working memory. Special systems called
truth maintenance systems have been developed to allow this. A number of conflict resolution
strategies are typically used to decide which rule to fire. These include:

Don't fire a rule twice on the same data. (We don't want to keep on adding (lecturing
alison) to working memory).

Fire rules on more recent working memory elements before older ones. This allows the
system to follow through a single chain of reasoning, rather than keeping on drawing new
conclusions from old data.

Fire rules with more specific preconditions before ones with more general preconditions.
This allows us to deal with non-standard cases. If, for example, we have a rule ``IF (bird
X) THEN ADD (flies X)'' and another rule ``IF (bird X) AND (penguin X) THEN ADD
(swims X)'' and a penguin called tweety, then we would fire the second rule first and start
to draw conclusions from the fact that tweety swims.

These strategies may help in getting reasonable behaviour from a forward chaining system, but
the most important thing is how we write the rules. They should be carefully constructed, with
the preconditions specifying as precisely as possible when different rules should fire. Otherwise
we will have little idea or control of what will happen. Sometimes special working memory
elements are used to help to control the behaviour of the system. For example, we might decide
that there are certain basic stages of processing in doing some task, and certain rules should only
be fired at a given stage - we could have a special working memory element (stage 1) and add
(stage 1) to the preconditions of all the relevant rules, removing the working memory element
when that stage was complete.
58

Forwards vs Backwards Reasoning


Whether you use forward or backwards reasoning to sove a problem depends on the properties of
your rule set and initial facts. Sometimes, if you have some particular goal (to test some
hypothesis), then backward chaining will be much more efficient, as you avoid drawing
conclusions from irrelevant facts. However, sometimes backward chaining can be very wasteful there may be many possible ways of trying to prove something, and you may have to try almost
all of them before you find one that works. Forward chaining may be better if you have lots of
things you want to prove (or if you just want to find out in general what new facts are true); when
you have a small set of initial facts; and when there tend to be lots of different rules which allow
you to draw the same conclusion. Backward chaining may be better if you are trying to prove a
single fact, given a large set of initial facts, and where, if you used forward chaining, lots of rules
would be eligible to fire in any cycle.
Case-based reasoning (CBR):

Stores cases (descriptions of past experiences) in a database for later retrieval

Searches the database for cases with similar characteristics to a new case to find and
apply appropriate solutions

Rely on continuous expansion and refinement by users

59

HOW CASE-BASED REASONING WORKS


Case-based reasoning represents knowledge as a database of past cases and their solutions. The
system uses a six-step process to generate solutions to new problems encountered by the user.

Lecture 9
Simple Expert Systems
Expert systems:

Capture tacit knowledge in a very specific, limited domain of human expertise

Support highly structured decision making

Model human knowledge as a set of rules called the knowledge base

Work by applying a set of IF-THEN-ELSE rules extracted from human experts

Use an inference engine to search through the knowledge base. In forward chaining, the
inference engine begins with information entered by the user to search the knowledge

60

base for a conclusion. In backward chaining, the system begins with a hypothesis and
asks the user questions to confirm or disprove the hypothesis.

The expert systems given below are very basic. These samples should give you and idea as to
where to start your coursework. They can all be quickly and easily implemented using Crystal
Simple Expert System 1: A CAR TROUBLE DIAGNOSTIC SYSTEM

Knowledge Acquisition
The first task is knowledge acquisition. The solutions for this expert system are based wholly on
knowledge on automotive systems from the internet and a local Jua Kali mechanic.
The basic items that were identified to be needed in order to get a vehicle to start are a
combustion chamber, some sort of mechanism to turn the engine, air and fuel to burn, and
something to ignite the air fuel mixture. All the solutions in this illustration deal with how these
elements come together in order to make a vehicle start. Below is an introduction to the basic
systems that were considered.
Battery: This is the part of a vehicle that stores the power that is required to turn the engine and
create a spark.
Battery Cables: This is a set of wires that carry the power from the battery to the starter and the
rest of the engine. These cables usually fail due to corrosion, which interferes with the energy
flow from the battery.
Starter: This is a mechanical device, an electric motor that uses power from the battery to rotate
the engine.
Coil: An electronic component that takes the twelve volts coming from the battery and converts
it to a much larger voltage.
Coil Wire: A wire which caries the voltage from the coil to the distribution or computer
controlled ignition points, which then distribute the pulse to the correct spark plug wire.
Spark Plug Wires: A set of wires that caries the electronic pulse from the distributor or ignition
points to the appropriate spark plug.
61

Spark Plugs: A set of electronic components constructed of insulators and conductors. These
spark plugs create a short between a spark point and a conductor. This short creates a spark that
ignites the air fuel mixture.
Fuel: Also referred to as petrol.
Fuel Filter: A filtering device located somewhere between the fuel tank and engine. Used to
eliminate impurities from the fuel.
Knowledge Representation
The Second step is to represent the acquired knowledge. This involved coming up with rules that
would later be encoded into the knowledge base of the expert system. The hard part was deciding
at what point should problems be included in the solution space and which should be dropped. A
decision was made to limit the solution space to problems that can be fixed without any special
knowledge of how a car works. This eliminated many problems including internal engine
failures.

Below, a decision tree is used to represent the reasoning used in the system.

Starter Turning ?
NO
YES
Lights on ?
Car Moving ?
NO

YES
YES

NO
Got Enough Fuel?

Cable OK?
Coil Clicks ?

Car is Fine
YES
Terminals
Clean ?

NO

Filter YES
Replaced
Recently?

YES

Call Mechanic

YES
Charge Battery
Coil Fuse OK ?

Buy Fuel
NO

YES

NO

NO

NO

Replace Starter

Replace Filter

62

Fix Battery Cable


YES

Clean Terminals

NO

Replace Fuse
Replace Coil

Simple Expert System 2: A MEDICAL DIAGNOSIS SYSTEM


Knowledge Acquisition Process
For this process, two medical practitioners one Doctor working for an organization, and one in
the private sector here in Nairobi - with a view to finding out: (1) Illnesses that have nearly
similar signs and symptoms to Malaria, and (2) The signs and symptoms for each of these
illnesses. The interviews were face-to-face. Other information was obtained from the internet.
The ailments covered are listed below, and the respective signs and symptoms also given.
Malaria
Fever
Chills (and sweating)
Coughing
Headache
Severe Headache
Nausea (and vomiting)
Body Malaise
Abdominal discomfort
Diarrhoea/Constipation
Loss of Appetite
Stiff neck
Photophobia (sensitive to

Malaria

+ RTI

RTI

Typhoid

Meningiti

light)
Note: RTI = Respiratory Track Infection (specifically the common cold, also referred to as the
upper respiratory infection).
Note: Cells shaded in green indicate YES for the given sign/symptom.
Rule-based Knowledge Representation
R1
R2
R3

IF fever THEN patient_ill


IF patient_ill AND coughing THEN respiratory_tract_infection
IF patient_ill AND headache AND chills_sweat AND nausea AND body_malaise

R4

THEN [malaria]
IF patient_ill AND respiratory_track_infection AND NOT malaria THEN
[common_cold]
63

R5

IF malaria AND respiratory_track_infection THEN

R6

[malaria_and_respiratory_tract]
IF patient_ill AND NOT chills_sweat AND headache AND severe_headache

R7

THEN non_malaria
IF non_malaria AND nausea AND stiff_neck AND photophobia THEN

R8

[meningitis]
IF non_malaria AND body_malaise AND diarrhoea_constipation AND

R9

appetite_loss AND NOT stiff_neck AND NOT photophobia THEN [typhoid]


IF patient_ill AND headache AND severe_headache AND body_malaise AND

R10

abdominal_discomfort AND stiff_neck THEN [unknown_illness_1]


IF patient_ill AND headache AND NOT severe_headache AND body_malaise

R11

AND NOT nausea THEN [unknown_illness_2]


IF patient_ill AND headache AND NOT severe_headache AND NOT

R12

body_malaise THEN [unknown_illness_3]


IF patient_ill AND NOT headache AND NOT common_cold THEN

R13

[unknown_illness_4]
IF patient_ill AND headache AND severe_headache AND NOT body_malaise

R14

AND nausea AND NOT stiff_neck THEN [unknown_illness_5]


IF patient_ill AND headache AND severe_headache AND NOT body_malaise
AND NOT nausea THEN [unknown_illness_6]

64

Simple Expert System 3: A MEDICAL DIAGNOSIS SYSTEM


Problem
The medical diagnosis system considered here is supposed to be used to diagnose diseases,
which may have some common symptoms. The diseases picked are Malaria, Typhoid,
Meningitis, Cholera, Amoebic Dysentery, Lobar Pneumonia and Hepatitis C. Source of
information was medical reference books and the Internet
When a patient goes to see a medical expert the first thing the expert obtains are the symptoms.
He/she then uses his/her knowledge to arrive at a possible conclusion after which he/she carries
out a confirmatory test. Since human is to error this knowledge may not always be accurate. The
system considered here, will be designed to obtain this possible conclusion for the expert,
suggest a confirmatory test, recommend treatment and warn the expert of any disease that
manifests similar symptoms.
Knowledge representation
After carrying out the research the following information was obtained.
AMOEBIC

LOBAR

CHOLERA

DYSENTR

PNEUMON

Diarrhea

Diarrhea

IA
High Fever

High Fever

Fever

Fever

Y
Diarrhea
Abdominal

Vomiting

Vomiting

Vomiting

Vomiting

Joint pains

Joint pains

Joint pains

Dehydration

TYPHOID

MENENGI

FEVER

TIS

Diarrhea

Diarrhea

High Fever

MALARIA

Dehydration
Severe
Headache
Nausea

Headache

Convulsions

pain
Fever
Blood/mucus
in stool

Abdominal

Cough
Abdominal
pain
Chest pain
Headache

pain

HEPATITS
C
Fever
Nausea
Diarrhea
Head ache
Abdominal
pain

Stool
Nausea

No appetite

Weakness

No appetite

Nausea

Rice water

analysis test

Joint pains

Vomiting

Chills

Constipation

for cysts

65

Convulsions
No appetite
Abdominal
pain

Cough
Abdominal
pain
Or rash
Stool/blood

Chills

test for S.
Typhi

Or
pneumonia
Or blood
smear

Chest pains
Fatigue

Or rashes

stool
Cold
clammy skin

Malaise

Fluid-filled

Yellowing
skin

Stiff neck

Tachycardia

Severe

Hypertensio

lungs
Swollen

headache

glands

Weakness

Soar throat

Body pain

Sunken eyes

Or asthma

Anemia

Dislike of
light
Spinal tap
test
paraparalisis

Or smoker

Blurred

Anemia

No appetite

vision

Dillusions

Tender liver

Liver
function test

Fatigue

No appetite

Rapid

Or Dark

respiration

urine

Dizziness
MISDIAGN

MISDIAGN

MISDIAGN

MISDIAGN

MISDIAGN

MISDIAGN

MISDIAGN

OSED

OSED

OSED

OSED

OSED

OSED

Plague

Flu

Dysentry

Cholera

Plague

OSED
Minor

Minor

Minor

Disease
Food

Disease

Pneumonia

poisoning

Disease
Flu

Tuberculosis

TREATME

TREATME

TREATME

TREATME

TREATME

TREATME

TREATME

NT

NT

NT
Intense

NT

NT

NT

NT

Chloroquine

Antibiotics

antibiotic

Relieve Pain

Relieve Pain

Relieve Pain

Fever

Fever

therapy
Quinine

Replace
Fluids
Metronidazo

Fever

le

Antibiotics penicillin
Erythromyci
66

n
Notes:
Minor disease: This is the catch all for a number of minor, debilitating minor illnesses other than
those listed. Usually this is no more than a head cold or bad case of the flu. Symptoms are widely
varying and are the discretion of the System User. Infection Symptoms: usually, fever, general
pain, vomiting, headaches, etc.
Misdiagnosed as: usually another minor disease, such as the incorrect flu bug. Sometimes
pneumonia.
Treatment: of pneumonia using antibiotics
From the above a decision was made to design a system where by the user will be expected to
key in the symptoms manifested and the system will compare them with those of the diseases in
the knowledge base. If they are similar then a diagnosis is arrived at otherwise the disease is
unidentified.

67

Lecture 10
PROLOG
Facts, Rules and Queries
Symbols
Prolog expressions are comprised of the following truth-functional symbols, which have the
same interpretation as in the predicate calculus.
English

Predicate Calculus

PROLOG

and

or

if

-->

:-

not

not

Variables and Names


Variables begin with an uppercase letter. Predicate names, function names, and the names for
objects must begin with a lowercase letter. Rules for forming names are the same as for the
predicate calculus.
mother_of
male
female
greater_than
socrates
Facts
A fact is a predicate expression that makes a declarative statement about the problem domain.
Whenever a variable occurs in a Prolog expression, it is assumed to be universally quantified.
Note that all Prolog sentences must end with a period.
likes(john, susie).

/* John likes Susie */

likes(X, susie).

/* Everyone likes Susie */

likes(john, Y).

/* John likes everybody */

likes(john, Y), likes(Y, john).

/* John likes everybody and everybody likes

John */
likes(john, susie); likes(john,mary). /* John likes Susie or John likes Mary */
68

not(likes(john,pizza)).

/* John does not like pizza */

likes(john,susie) :- likes(john,mary)./* John likes Susie if John likes Mary.

Rules
A rule is a predicate expression that uses logical implication (:-) to describe a relationship among
facts. Thus a Prolog rule takes the form
left_hand_side :- right_hand_side .
This sentence is interpreted as: left_hand_side if right_hand_side. The left_hand_side is
restricted to a single, positive, literal, which means it must consist of a positive atomic
expression. It cannot be negated and it cannot contain logical connectives.
This notation is known as a Horn clause. In Horn clause logic, the left hand side of the clause is
the conclusion, and must be a single positive literal. The right hand side contains the premises.
The Horn clause calculus is equivalent to the first-order predicate calculus.
Examples of valid rules:
friends(X,Y) :- likes(X,Y),likes(Y,X).

/* X and Y are friends if they like

each other */
hates(X,Y) :- not(likes(X,Y)).

/* X hates Y if X does not like Y. */

enemies(X,Y) :- not(likes(X,Y)),not(likes(Y,X)). /* X and Y are enemies if they


don't like each other */
Examples of invalid rules:
left_of(X,Y) :- right_of(Y,X)

/* Missing a period */

likes(X,Y),likes(Y,X) :- friends(X,Y).

/* LHS is not a single literal */

not(likes(X,Y)) :- hates(X,Y).

/* LHS cannot be negated */

Queries
The Prolog interpreter responds to queries about the facts and rules represented in its database.
The database is assumed to represent what is true about a particular problem domain. In making
a query you are asking Prolog whether it can prove that your query is true. If so, it answers "yes"
69

and displays any variable bindings that it made in coming up with the answer. If it fails to prove
the query true, it answers "No".
Whenever you run the Prolog interpreter, it will prompt you with ?-. For example, suppose our
database consists of the following facts about a fictitious family.
father_of(joe,paul).
father_of(joe,mary).
mother_of(jane,paul).
mother_of(jane,mary).
male(paul).
male(joe).
female(mary).
female(jane).
We get the following results when we make queries about this database
| ?- father_of(joe,paul).
true ?
yes
| ?- father_of(paul,mary).
no
| ?- father_of(X,mary).
X = joe
yes
| ?Closed World Assumption. The Prolog interpreter assumes that the database is a closed world
-- that is, if it cannot prove something is true, it assume that it is false. This is also known as
negation as failure -- that is, something is false if PROLOG cannot prove it true given the facts
and rules in its database. In this case, in may well be (in the real world), that Paul is the father of
Mary, but since this cannot be proved given the current family database, Prolog concludes that it
70

is false. So PROLOG assumes that its database contains complete knowledge of the domain it is
begin asked about.
Prolog's Proof Procedure
In responding to queries, the Prolog interpreter uses a backtracking search, similar to the one we
study in Chapter 3 of Luger. To see how this works, let's add the following rules to our database:
parent_of(X,Y) :- father_of(X,Y).
parent_of(X,Y) :- mother_of(X,Y).

/* Rule #1 */
/* Rule #2 */

And let's trace how PROLOG would process the query. Suppose the facts and rules of this
database are arranged in the order in which they were input. This trace assumes you know how
unification works.
?- parent_of(jane,mary).
parent_of(jane,mary)

/* Prolog starts here and searches

for a matching fact or rule. */


parent_of(X,Y)

/* Prolog unifies the query with the rule #1


using {jane/X, mary/Y}, giving
parent_of(jane,mary) :- father_of(jane,mary) */

father_of(jane,mary) /* Prolog replaces LHS with RHS and searches. */


/* This fails to match father_of(joe,paul) and
and father_of(joe,mary), so this FAILS. */
/* Prolog BACKTRACKS to the other rule #2 and
unifies with {jane/X, mary/Y}, so it matches
parent_of(jane,mary) :- mother_of(jane,mary) */
mother_of(jane,mary) /* Prolog replaces LHS with RHS and searches. */
YES.

/* Prolog finds a match with a literal and so succeeds.

Here's a trace of this query using Prolog's trace predicate:


| ?- trace,parent_of(jane,mary).
{The debugger will first creep -- showing everything (trace)}
1 1 Call: parent_of(jane,mary) ?
2 2 Call: father_of(jane,mary) ?
2 2 Fail: father_of(jane,mary) ?
71

2 2 Call: mother_of(jane,mary) ?
2 2 Exit: mother_of(jane,mary) ?
1 1 Exit: parent_of(jane,mary) ?
yes
{trace}
| ?Exercises
1. Add a male() rule that includes all fathers as males.
2. Add a female() rule that includes all mothers as females.
3. Add the following rules to the family database:
4.

son_of(X,Y)

5.

daughter_of(X,Y)

6.

sibling_of(X,Y)

7.

brother_of(X,Y)

8.

sister_of(X,Y)

9.

Given the addition of the sibling_of rule, and assuming the above order for the facts and
rules, show the PROLOG trace for the query sibling_of(paul,mary).

72

Das könnte Ihnen auch gefallen