Beruflich Dokumente
Kultur Dokumente
WRITTEN BY:
EDITED BY:
COURSE AUTHORS NAME
TOPIC
Overview of Artificial Intelligence
Introduction-definitions and history, Branches of AI, Applications of AI
State space search: Strategies- Uninformed and heuristic search strategies
o Depth-first, breadth-first, uniform-cost search, A* Search algorithm, greedy search, e.t.
KBS development and implementation
Knowledge acquisition
7&8
Components of a KBS
10
PROLOG
References
1. Decision Support Systems and Intelligent Systems (7th Edition): Books by Efraim Turban,Jay E.
Aronson,Ting-Peng Liang
2. Engineering Knowledge-Based Systems: Theory and Practice, by Avelino Gonzalez and Douglas
Dankel II.
3. Artificial Intelligence: A Modern Approach (2nd ed.), Russel, S., & Norvig, P. (2003).. New Jersey:
Prentice Hall.
Lecture One
Introduction to Artificial Intelligence (A.I)
Lecture Overview
Intelligence
Defining A.I
A.I Applications
Intelligence
Dictionary definition.
(1) The ability to learn or understand or to deal with new or trying situations : REASON; also :
the skilled use of reason
(2) The ability to apply knowledge to manipulate one's environment or to think abstractly as
measured by objective criteria (as tests)
Defining A.I
There is no agreed definition of the term artificial intelligence. However, there are various
definitions that have been proposed. These are considered below.
AI is a study in which computer systems are made that think like human beings.
Haugeland, 1985 & Bellman, 1978.
AI is a study in which computer systems are made that act like people. AI is the art of
creating computers that perform functions that require intelligence when performed by
people. Kurzweil, 1990.
AI is the study of how to make computers do things which at the moment people are
better at. Rich & Knight
AI is a study in which computers that rationally think are made. Charniac & McDermott,
1985.
4
AI is the study of computations that make it possible to perceive, reason and act. Winston,
1992
AI is the study in which systems that rationally act are made. AI is considered to be a
study that seeks to explain and emulate intelligent behaviour in terms of computational
processes. Schalkeoff, 1990.
Thinking humanly
Thinking rationally
Acting humanly
Acting rationally
Therefore A.I is the part of computer science concerned with designing intelligent computer
systems, that is, computer systems that exhibit the characteristics we associate with intelligence
in human behaviour - understanding language, learning, reasoning and solving problems
A.I Applications
Data Mining
Problem Solving: complex problems e.g puzzle, mathematical problems, logistic planning
Robotics: intelligent systems which can control robots e.g. surgeon systems
A.I agents
Branches of A.I
Machine vision
Machine Learning
Robotics
Problem solving
Game playing
Knowledge-based systems
A.I agents
..
Intelligent Techniques
Intelligence techniques may be used for:
Capturing individual and collective knowledge and extending a knowledge base, using
artificial intelligence and database technologies
Capturing tacit knowledge, using expert systems, case-based reasoning, and fuzzy logic
Artificial intelligence (AI) is the effort to develop computer-based systems (both hardware and
software) that behave as humans, with the ability to learn languages, accomplish physical tasks,
use a perceptual apparatus, and emulate human expertise and decision making.
Require input from both human experts for defining the knowledge base and knowledge
engineers, who translate the knowledge into a set of rules
Provide solutions to problems requiring expertise that is difficult to represent in the form
of crisp IF-THEN rules
Neural networks:
Find patterns and relationships in massive amounts of data that would be too complicated
and difficult for a human being to analyze.
"Learn" patterns by sifting through data, searching for relationships, building models, and
correcting over and over again the model's own mistakes.
Use a large number of sensing and processing nodes that continuously interact with each
other
May be sensitive and not perform well with too little or too much data
Are used in science, medicine, and business primarily to discriminate patterns in massive
amounts of data.
Genetic algorithms:
Use processes such as fitness, crossover, and mutation are to "breed" solutions.
8
Are useful for dynamic and complex business problems involving hundreds or thousands
of variables, such as problems involving engineering design optimization, product design,
and monitoring industrial systems.
Hybrid AI systems:
Integrate genetic algorithms, fuzzy logic, neural networks, and expert systems are being
developed to take advantage of the best features of each technology.
Intelligent agents:
Are software programs that work in the background without direct human intervention
Use a limited built-in or learned knowledge base to accomplish tasks or make decisions
on the user's behalf
Are used in agent-based modeling applications used to model or simulate the behavior
of consumers, stock markets, and supply chains and to predict the spread of epidemics
Lecture 2
State space search
1.
Introduction
All AI tasks involve searching.
General idea:
you know the available actions that you could perform to solve your problem
you don't know which ones in particular should be used and in what sequence, in order
to obtain a solution
You can search through all the possibilities to find one particular sequence of actions
that will give a solution.
The scenario:
o
Initial state
The task: Find a sequence of operations that will move us from the initial state to the target
state. Solved in terms of searching a graph. The set of all states: search space
10
Examples:
Expert systems: Find the sequence of rules that will prove the goal (backward
chaining)
Puzzles: Find a sequence of actions to solve the puzzle
Chess: Find the sequence of moves that will result in winning the game.
Search techniques:
Uninformed search: Exhaustive search (brute force methods: systematically and
exhaustively search all possible paths)
Depth-first
Breadth-first
Informed search: Heuristic search (use rules-of-thumb to guess which paths are likely to
lead to a solution)
Hill climbing
Best-first search
A* algorithm
12
Path: Sequence of nodes such that each two neighbors represent an edge
Examples: in G1: A B D C A E, in G2: EABDC
Note: the sequence ABEA in G2 is not a path, because the edges have
orientation, and there is no edge BE, the edge is EB
Cycle: a path with the first node equal to the last and no other nodes are repeated
Examples: In G1: A B D C A, In G2: no cycles.
Acyclic graph: a graph without cycles
Tree: undirected acyclic graph, where one node is chosen to be the root
Given a graph and a node:
Out-going edges: all edges that start in that node
In-coming edges : all edges that end up in that node
Successors (Children): the end nodes of all out-going edges
Ancestors (Parents): the nodes that are start points of in-coming edges
In undirected graphs the edges are symmetrical, i.e. the notion of child and parent
depends on how the graph is traversed.
13
3.
Exhaustive search
3. 1. Breadth-first search
At step i traverse all nodes at level i.
A. in trees
14
Else find all successor nodes of N and put them into the queue.
In essence this is Dijkstra's algorithm of finding the shortest path between two nodes
in a graph.
3. 2. Depth-first search
Keep going down one path until you get to a dead end. Then back up and try alternatives.
Algorithm: using a stack
1. Stack = [initial_node] , FOUND = False
2. While stack not empty and FOUND = False do:
Remove the top node N
If N = target node then FOUND = true
Else find all successor nodes of N and put them onto the stack.
Search order for the sample tree:
A, B, D (leftmost path)
E
C, G, I (back up thru A - explore its right subtree, and follow the leftmost path)
J
15
16
Lecture 3
Heuristic search
4.
Heuristic search
Heuristic search is used to reduce the search space.
Basic idea: explore only promising states/paths.
We need an evaluation function to estimate each state/path.
4. 1. Hill climbing
Basic idea: always head towards a state which is better than the current one.
Example: if you are at town A and you can get to town B and town C (and your target is
town D) then you should make a move IF town B or C appear nearer to town D than town A
does.
Algorithm:
o
Get the successors of the current state and use the evaluation function to
assign a score to each successor.
If one of the successors has a better score than the current-state then set the
new current-state to be the successor with the best score.
minimum memory requirements. However it is not guaranteed that a solution will be found
- the local maxima problem.
General hill climbing is only good for a limited class of problems where we have an
evaluation function that fairly accurately predicts the actual distance to a solution.
4. 2. Best-first search
The algorithm works in breadth-first manner, keeps a data structure (called agenda,
based on priority queues) of all successors and their scores.
Algorithm:
If it is the goal node then return with success. Otherwise find its
successors.
Assign the successor nodes a score using the evaluation function and add
the scored nodes to the agenda
If a node that has been chosen does not lead to a solution, the next "best" node is chosen, so
eventually the solution is found.
The algorithm always finds a solution, not guaranteed to be the optimal one.
Comparison with hill-climbing
Similarities: best-first always chooses the best node
Difference: best-first search keeps an agenda as in breadth-first search, and in case of
a dead end it will backtrack, choosing the next-best node.
18
Note: if the evaluation function is very expensive (i.e., it takes a long time to work out a
score) the benefits of cutting down on the amount of search may be outweighed by the costs
of assigning a score.
19
4. 3. The A* Algorithm
Best-first search doesn't take into account the cost of the path so far when choosing which
node to search from next. A* attempts to find a solution which minimizes the total length or
cost of the solution path.
A* algorithm uses an evaluation function that accounts for the cost from the initial state to
the current state, and the cost from the current state to the goal state (i.e. the score assigned to
the node in consideration).
F(Node) = g(Node) + h(Node)
g(Node) - the costs from the initial state to the current node
h(Node) - future costs, i.e. node score
A* always finds the best solution, provided that h(Node) does not overestimate the future
costs.
Thus, in the next example,
Hill climbing will choose node H and will be stuck in node K.
Best-first will choose node H, go to K, backtrack to F and will find a path to G: A H F
G, though not the optimal one.
A* will choose node D as its total score is 12: the sum of g(D) = 2 plus h(D) = 10.
Example:
20
Start state: A
Goal state: G
d. For each node write down its successors
e. Draw the search tree, that corresponds to the graph
f. Write the sequence of visited nodes and the cost of the path in:
depth-first search
breadth-first search
hill climbing
best-first search
A*
Revision Questions
Explain how breadth-first and depth first algorithms work, discuss the advantages and
disadvantages of each of them.
21
Given a graph as in the example above, be able to perform the tasks listed in the example.
22
Lecture 4
Knowledge Acquisition
Introduction
First, what precisely do we mean by knowledge?
We know about data processing and about information processing. The difference is a question of
levels.
First, data means just uninterrupted values, e.g. 46.
Second, information means organized values, which can be regarded as having some sense or
interpretation, e.g. 46 held as the age field in a personal_details record. (data + sense).
Third knowledge means information which is known to be true . (data + sense + knowing).
Types of Knowledge
It is recognized that there are different kinds of knowledge:
Declarative knowledge: facts.
Procedural knowledge: how to do things.
Semantic knowledge: use and meanings of words.
Conceptual knowledge: abstract knowledge of concepts and relationships between concepts.
Episodic knowledge: detailed knowledge of particular occurrences or experiences.
Meta-knowledge: knowledge about knowledge, e.g. how experts actually organise and use
their knowledge.
(Of course, not all of these need be involved in every knowledge-based system.)
The main difficulties about knowledge are:
Its overall unstructured nature
Its breadth and complexity
Even where knowledge is clearly structured (as in an encyclopaedia, for example), there is may
still be a practical difficulty over identifying and finding relevant knowledge.
Levels of Knowledge
23
AI workers (and psychologists) have recognised that there are different levels of knowledge:
shallow knowledge
deep knowledge
Shallow knowledge means surface-level information about appearances and behaviour in very
specific situations. For example If the petrol tank is empty then the car will not start.
Typically such knowledge could be in the form of rules of the form IF THEN .
We might have a lot of shallow knowledge (say about cars) and still have little understanding.
Dealing with complex or unfamiliar situations, or giving explanations, may not be easy just on
the basis of shallow knowledge.
Deep knowledge means knowledge of the internal and causal structure of a context or situation.
For example, knowledge of how a car engine works and of what happens inside it.
Such knowledge is much harder to represent in a computer. It may involve concepts,
relationships, abstractions and analogies.
Sources of Knowledge
Sources of knowledge are extremely varied. Books, databases, people. ES can be built with
appropriate means to search in databases. It s people that are the problem!
The knowledge acquisition problem is to elicit and formalise human knowledge and expertise.
Human knowledge is not well-structured. Worse, experts may use their knowledge
unconsciously. Also different experts not only may disagree, but also may have wholly different
approaches and methods by which they apply their knowledge.
The knowledge acquisition problem is to elicit and formalise human knowledge.
There is a range of methods: manual, semi-automatic, automatic.
Knowledge Acquisition
Manual Knowledge Acquisition
Two kinds of approach:
1. via interviews with experts
2. via observation of experts in action
The knowledge engineer elicits the knowledge from the expert and fits it into some chosen
knowledge representation scheme (which the expert will generally not know about).
24
1. Interviewing is a skill, and much effort has been put into developing interviewing techniques
(not just for knowledge acquisition).
Interviews may be structured: the interviewer may work to a standardized scheme of questioning.
This may be appropriate if the knowledge representation scheme has been previously worked
out.
Or the interviewing may consist of having the expert talk through his approach to certain
particular problems, with prompting from the interviewer.
2. Observation just means noting circumstances which arise and actions taken. The observer
may intrude by asking the expert to give his reasons for particular steps, or to think aloud while
he is working. Tracking is the jargon word for following the expert's train of thought.
The difficulty is that the expert is not generally a knowledge engineer and the knowledge
engineer is not generally an expert. There is a gap to be bridged, since neither will know what is
of significance to the other.
The solution is likely to be to allow the expert to become a knowledge engineer, possibly by
giving him/her computer support.
Knowledge engineering is itself an expert task. It is clearly possible to envisage an expert system
which may assist with it.
Meanwhile, let us consider ways in which machine assistance may be brought into the
knowledge engineering process.
Semi-Automatic Knowledge Acquisition
Our rule-based shell may be regarded as providing computer support for knowledge acquisition
via its Build facility. It does not really elicit knowledge, though.
We look at one technique which can be automated for eliciting knowledge: Repertory Grid
Analysis.
Repertory Grid Analysis
Example: Consider the problem of selecting an appropriate programming language for a
particular programming task. The first stage of RGA involves the following steps:
1. The expert identifies important objects (e.g. Java, LISP, Cobol, Prolog, Perl, Fortran, C).
2. The expert identifies important attributes of these (e.g. availability, ease of use, training time,
orientation).
25
3. The expert identifies for each attribute a criterion or measure (e.g. for availability
High/Medium/Low, for orientation Symbolic/General/Numeric).
Once these have been established, the expert is prompted by the following indirect means to
impart his expertise:
1. The interviewer (or automatic system) repeatedly asks questions about which attributes
distinguish some objects from others, perhaps by giving three objects, and asking for an attribute
which can distinguish two of them from the third (e.g. for LISP, Prolog and Cobol, two are
Symbolic and one is not).
2. The interviewer builds up a table (grid) containing numerical ratings for the attributes for each
object.
3. The expert may then examine the results, and adjust the table if it appears not to be a correct
representation of the knowledge.
This is a simplified description of the process. Computer systems exist which use this approach
to elicit knowledge in a quite sophisticated way (see Turban).
Automatic Knowledge Acquisition
Broadly, this means using a computer program to convert data into knowledge. This process may
also be described as learning.
We may imagine other situations like the choice of programming language, where a given
situation has certain characteristics which will determine a correct decision or action.
The idea is to create general rules from a set of example cases where the correct outcome is
known. These cases may be
Real existing data, or
The record of the program's own experience, or
generated by an expert to represent his/her knowledge.
There are automated systems which do this. A well-known one is ID3 (Turban p146). Given a set
of cases, it orders the various attributes as to relevance to the outcome, and then builds a decision
tree. This tree may then be used to reach a conclusion when we are given a new case, with new
attribute values.
26
Lecture 5 and 6
Knowledge Representation
Introduction
Knowledge. True rational belief(philosophy).OR facts, data and relationships
(Computational view).
Representation. Structure + operations; OR map + operations; OR game layout and rules
of play; OR abstract data types.
Knowledge representation. Framework for storing knowledge and manipulating
knowledge OR Set of syntactic and semantic conventions that makes it possible to
describe things. Bench-Capon, 1990.
The object of KR is to express knowledge in a computer-tractable form, so that it can be
used to help agents perform well.
A KR language is defined by two aspects:
Syntax: describes how to make sentences OR describes the possible
configurations that can constitute sentences.
Semantics: determine the facts in the world to which the sentences refer OR the
things in the sentence.
Inference:
The terms inference and reasoning are generally used to cover any process by which
conclusions are reached.
Logical inference deduction
Frames
Semantic Nets
Rules
Logic
Propositional logic (Boolean Logic)
Predicate logic (First Order Logic)
1. Natural Language
Expressiveness of natural language:
Very expressive, probably everything that can be expressed symbolically can be
expressed in natural language (pictures, content of art, emotions are often hard to express)
Probably the most expressive knowledge representation formalism we have. Reasoning is
very complex, hard to model
Problems with natural language:
Natural language is often ambiguous.
Syntax and semantics are not fully understood.
There is little uniformity in the structure of sentences.
2. Semantic Networks
Originally developed in the early 1960s to represent the meaning of English words. The term
dates back to Ross Quillian's Ph.D. thesis (1968), in which he first introduced it as a way of
talking about the organization of human semantic memory, or memory for word concepts.
A semantic net is a graph, where the nodes in the graph represent concepts, and the arcs
represent binary relationships between concepts.
28
Types of relations:
subclass, the link is named is_a
member, the link is named is_instance_of
Other relations used depend on the application. (e.g. has_parts, likes, etc)
Property inheritance is the basic inference mechanism for semantic networks.
Example
This network represents the fact that mammals and reptiles are animals that mammals have
heads, an elephant is a mammal, and Clyde is a particular elephant
Inferring facts not explicitly represented: Clyde has a head.
Representational adequacy - problems with representing quantifiers, (such as ``every dog in
town has bitten the constable'')
Advantages. Easy to translate to predicate calculus.
29
Disadvantages. Cannot handle quantifiers; nodes may have confusing roles or meanings;
searching may lead to combinatorial explosion; cannot express standard logical connectives;
can represent only binary or unary predicates.
Summary:
Use inheritance via the is_a and is_instance_of relations to infer implicit facts
3. Frames
Frames capture knowledge about typical objects or events, such as a typical bird, or a typical
restaurant meal. All the information relevant to a particular concept is stored in
a single complex entity, called a frame.
Frames support inheritance.
Example 1
Mammal
subclass:
Animal
warm_blooded: yes
Elephant
subclass:
Mammal
* colour:
grey
* size:
large
Clyde
instance:
Elephant
colour:
pink
30
owner:
Fred
Nellie:
instance:
size:
Elephant
small
Clyde is an elephant. What would be the value of the slot has part?
Several parent classes (e.g., Clyde is both an elephant and a circus-animal)
Which parent to inherit from first?.
Slots and Procedures
Frame representation can use a procedure to compute the value of a given slot if needed, e.g.
the area of a square, given the size
32
Example 2
Advantages: can cope with missing values- close matches are presented.
Disadvantages: has been hard to implement, especially inheritance. Representational adequacy
certain things are difficult to represent: Negation, disjunction, quantification
4. Rules
These are formalization often used to specify recommendations, give directives or strategy.
Format:
Related ideas: rules and fact base; conflict set - source of rules; conflict resolution- deciding on
rules to apply.
One of the most popular approaches to knowledge representation is to use production rules,
sometimes called IF-THEN rules. They can take various forms.e.g.
IF condition THEN action
IF premise THEN conclusion
IF proposition p1 and proposition p2 are true
33
Production Rules
Example: IF income is `standard' and payment history is `good', THEN `approve home
loan'.
The premise is a Boolean expression that should evaluate to be true for the rule to be
applied.
The action part of the rule is separated from the premise by the keyword THEN.
A rule based system will contain global rules and facts about the knowledge domain covered.
During a particular run of the system a database of local knowledge may also be established,
relating to the particular case in hand. One of the most widely used tutorial examples of rule
based systems is Mycin, an expert system which was designed to assist doctors with the
diagnosis and treatment of bacterial infection. It uses the rule based approach and also
demonstrates the way in which uncertainty (both in observations and in the reasoning process)
may be handled.
Mycin was designed to help the doctor to decide whether a patient has a bacterial infection,
which organism is responsible, which drug may be appropriate for this infection, and which may
be used on the specific patient.
The global knowledge base contains facts and rules relating for example symptoms to infections,
and the local database will contain particular observations about the patient being examined. A
typical rule in Mycin is as follows:
IF the identity of the germ is not known with certainty
AND the germ is gram-positive
AND the morphology of the organism is "rod"
AND the germ is aerobic
THEN there is a strong probability (0.8) that the germ is of type enterobacteriacae
Note that a probability or certainty factor (C.F.) is given, reflecting the strength of the original
expert's confidence in the inference made in this rule. In other words, the confidence in the
conclusion assuming the premises are true. The premises are, in fact, established from
observations either in the laboratory or from the patient, and may themselves have an element of
uncertainty associated with them. In the above example it may only be known that the germ is
aerobic with a probability of 0.5.
The certainty factor associated with a conclusion in MYCIN is calculated from the certainty
factor of the premises, the certainty factor of the rule and any existing certainty factors for the
conclusion if it has been obtained already from some other rules.
The way in which the knowledge base is used is determined by the inference engine. It is a basic
principle of production systems that each rule should be an independent item of knowledge and
35
essentially ignorant of other rules. The inference engine could then simply "fire" rules at any
time when its premises are satisfied.
If several rules could all fire at once the inference engine must have a mechanism for "conflict
resolution". This may be achieved, for example, by having some predefined order, perhaps on the
basis of the strength of the conclusion, or alternatively on the basis of frequency of rule usage.
Forward and Backward chaining through the rules may be used. The two systems each have their
advantages and disadvantages and in fact answer different types of question. For example, in
Mycin a forward chaining system might answer the question "what do these symptoms suggest?"
whereas a backward chaining system might answer the question "does this patient suffer from a
pelvic abscess?" In general, rules and goals may need to be constructed differently for forward
and backward chaining systems.
5. Propositions
A proposition is a statement that is either true or false.
For example, here are some propositions:
The file is being printed.
The system is ready.
The red light is on.
It is conventional to represent propositions by lower case letters.
For example:
p: The file is being printed.
q: The system is ready.
r: The red light is on.
Then, using various symbols that we will define shortly, the specification may be
rewritten as:
(tfm)(bw)
It is much easier to see the "structure" of this symbolic statement than the verbal one.
37
Either the red light is on and the file is printed or else the system is not ready.
(r p) ~q
Example 1
Q)
contains variables P and Q each of which represents an arbitrary proposition. Thus a proposition
takes different values depending on the values of the constituent variables. This relationship of
the value of a proposition and those of its constituent variables can be represented by a table. It
tabulates the value of a proposition for all possible values of its variables and it is called a truth
table.
For example the following table shows the relationship between the values of P, Q and P Q:
OR
P
(P Q)
In
the
table,
represents
truth
value
false
and
true.
This table shows that P Q is false if P and Q are both false, and it is true in all the other cases.
39
.
NOT
T
F
(P Q)
(P Q)
F
F
T
F
T
F
T
T
AND
IMPLIES
(P
This F
T
table
T
Q)
shows that (P Q) is true if both P and Q are true, and that it is false in any other case.
P)is
true.
When P
Q. That is P
When P
Q)
Q. That is
always take the same value regardless of the value of the variables in them. See Identities for
examples of
.
40
OR
Assignment:
Construct the truth table for p (q r)
Tautologies and contradictions
A proposition that is true for every combination of truth values is called a tautology.
A proposition that is false for every possible combination OF truth values is called a
contradiction.
Logical equivalence
We often have different, but equivalent, logical expressions; that is expressions that look
different but having the same meaning.
For our purposes the most important laws are the distributive laws and de Morgan's
laws.
41
Distributive Laws
p (q r) = (p q) (p r)
p (q r) = (p q) (p r)
De Morgans Laws
(p q) = p q
(p q) = p q
Conditional statements
In computing we often use conditional statements of the form "If then "; in other
words if particular conditions are satisfied then certain consequences should follow.
There are many ways of expressing this type of statement in English.
A proposition of the form "If p then q" is called a conditional statement, and is represented
by p q.
The symbolic statement is usually read as "if p then q" or
perhaps as "p implies q".
Construct truth tables for the following proposition:
((p q) r) (p r)
The contrapositive of a conditional
The contrapositive of a conditional is just another way of saying the same thing as
the conditional.
When one is true, then so is the other. If one is false, so is the other.
6. Predicates
For example:
"7 > 20" is a proposition, whereas
"x > 20" is a predicate
"Peter owns 3 cats" is a proposition, whereas
"x owns y cats" is a predicate
A set is a collection of things, usually (but not necessarily) sharing some common
attribute.
o Eg. let P = set of all people in this room
o Eg. let A = set of all letters in the alphabet
Eg. R denotes the set of all real numbers (numbers with a decimal
point)
A predicate has one or more variables, and if we substitute values for the variables the
predicate becomes a proposition and has a truth value.
Instead of substituting particular values for the variables, we may be able to make a more
general statement by using a quantifier.
existential quantifier .
Example:
For all real numbers x, Q(x) We can use the symbol which means "for all" or "for
43
Example:
Then x P, C(x) means "Everybody in this room likes Chocolate And if we define the
predicate
The existential quantifier is used when we are making statements of the type "some do" or
"some don't". The symbol is read as "there exists " or "there is at least one " or "for
some "
Examples:
44
Example:
Let P be a set of people and M a set of movies, and define the predicate S(x, y) to
mean "person x has seen movie y".
45
Lecture 7
Knowledge-based Systems
Knowledge-based system is a computer system that is programmed to imitate human
problem-solving by means of artificial intelligence and reference to a database of
knowledge on a particular subject.
Knowledge-based systems are systems based on the methods and techniques of Artificial
Intelligence. Their core components are the knowledge base and the inference
mechanisms.
KBS is a frequently used abbreviation for knowledge-based system.
Remarks:
1. KBS is often used as a synonym for an expert system (ES) although the two are
not the same in a strict sense. Strictly speaking, a KBS is any system that uses
knowledge in performing its tasks.
2. KBS is a branch of artificial intelligence.
3. Keywords in the definition: "knowledge", "represents", "reasons", "specialist".
4. KBS uses the heuristic method in problem solving.
46
Characteristics of KBS
1. KBS differs from conventional programs.
It simulates human reasoning about a domain, rather than the domain itself.
It performs reasoning over representations of human knowledge, in addition to
doing
Applications of KBS
Some areas where KBS has been very successful:
1. Medical diagnosis: MYCIN (for blood disorder)
2. Molecular structure analysis: DENTRAL
3. Computer configuration: XCON (R1)
4. Machine fault diagnosis
5. Fraud detection
47
6. Loan evaluation
7. ... ...
Too many to enumerate.
Major Components of a KBS
A KBS usually consists of four major components:
User interface
converts user queries into an internal representation to be processed by the system, and
converts system's solutions and explanations into a language which the user can
understand.
Knowledge Base
contains expert knowledge about a narrow domain of application.
Inference Engine
manipulates the knowledge base, i.e., deduce new knowledge from the knowledge base,
to give answers to user's queries.
Explanation generator
(Sometimes it is also considered as part of the Inference engine.) provides explanations to
the user about how the system arrives at a conclusion so that the user can be convinced.
48
When the efficiency and consistency of the expert need to be enhanced, and
Economic Considerations
The following criteria should be met before embarking on a KBS project for solving a
highly constrained class of problems:
The expert's solution to the problem is satisfactory (but may suffer from
procedural difficulties such as timeliness).
Decisions made by nonexperts are likely to be different from those of the expert
and to have a significant impact on the organization in terms of
financial cost,
resource consumption,
risk.
4. Domain Experts
5. Knowledge Engineers/Crafters
6. Apprentice Knowledge Engineers/Crafters
Knowledge Engineering and Knowledge Engineers
Knowledge engineers are those who study the problem domain, acquire knowledge
from the expert and represent the knowledge in a structured form in the knowledge
base.
There are different views towards how mature the KBS technology and design
process are. Some people regard the technology and the process are not mature
enough to be engineered, thus the term crafting and crafter.
51
Lecture 8
Rule-Based Systems and Shells
It was noted early on in the history of ES that certain parts of an ES could be re-used for other ES
which dealt with different domains.
So attention has been given to developing frameworks or shells which provide as much as
possible of an ES and into which the context-dependent parts can be fitted. (The idea of a
general problem solver was perhaps not so unrealistic after all.)
A shell may provide:
Knowledge acquisition subsystem.
Inference engine.
User interface.
Explanation subsystem.
Shells
A shell does not provide a knowledge base (though it will provide the structure for a knowledge
base).
In order to build an ES using shell, it is necessary only to construct and install a knowledge base
As we shall see, different expert systems may be designed on fundamentally different principles,
containing knowledge bases with completely different structures.
The most significant categories are rule-based, case- based and model-based systems.
We shall consider first rule-based systems ...
Knowledge as Rules
By an IF ... THEN ... rule we mean something like:
IF ID Checked
AND Satisfactory Employment
AND Salary Adequate
THEN Credit Granted
It will be convenient to think of (and write) such rules with the conclusion first:
Credit Granted
IF ID Checked
AND Satisfactory Employment
AND Salary Adequate
52
The part of a rule after the IF is called the body of the rule. It contains what will be subgoals.
As we shall see, there are several different kinds of things which can appear in the body of a rule.
Rules may contain AND, as above. They may also contain OR.
For example
ID Checked
IF Credit Card Shown
OR Driving Licence Shown
OR Passport Shown
As these two examples show, the rules will form a tree structure. Trying to demonstrate the
truth' of the conclusion to a rule will lead to requirements to demonstrate the truth of premises of
the rule, which in turn will lead to . . .
Each rule is referred to by its conclusion, so the two rules above are called Credit Granted and
ID Checked.
Backward Chaining Systems
So far we have looked at how rule-based systems can be used to draw new conclusions from
existing data, adding these conclusions to a working memory. This approach is most useful when
you know all the initial facts, but don't have much idea what the conclusion might be.
If you DO know what the conclusion might be, or have some specific hypothesis to test, forward
chaining systems may be inefficient. You COULD keep on forward chaining until no more rules
apply or you have added your hypothesis to the working memory. But in the process the system
is likely to do alot of irrelevant work, adding uninteresting conclusions to working memory. For
example, suppose we are interested in whether Alison is in a bad mood. We could repeatedly fire
rules, updating the working memory, checking each time whether (bad-mood alison) is in the
new working memory. But maybe we had a whole batch of rules for drawing conclusions about
what happens when I'm lecturing, or what happens in February - we really don't care about this,
so would rather only have to draw the conclusions that are relevant to the goal.
This can be done by backward chaining from the goal state (or on some hypothesised state that
we are interested in). This is essentially what Prolog does, so it should be fairly familiar to you
by now. Given a goal state to try and prove (e.g., (bad-mood alison)) the system will first check
to see if the goal matches the initial facts given. If it does, then that goal succeeds. If it doesn't
the system will look for rules whose conclusions (previously referred to as actions) match the
53
goal. One such rule will be chosen, and the system will then try to prove any facts in the
preconditions of the rule using the same procedure, setting these as new goals to prove. Note that
a backward chaining system does NOT need to update a working memory. Instead it needs to
keep track of what goals it needs to prove to prove its main hypothesis.
In principle we can use the same set of rules for both forward and backward chaining. However,
in practice we may choose to write the rules slightly differently if we are going to be using them
for backward chaining. In backward chaining we are concerned with matching the conclusion of
a rule against some goal that we are trying to prove. So the 'then' part of the rule is usually not
expressed as an action to take (e.g., add/delete), but as a state which will be true if the premises
are true.
So, suppose we have the following rules:
1. IF (lecturing X)
AND (marking-practicals X)
THEN (overworked X)
2. IF (month february)
THEN (lecturing alison)
3. IF (month february)
THEN (marking-practicals alison)
4. IF (overworked X)
THEN (bad-mood X)
5. IF (slept-badly X)
THEN (bad-mood X)
6. IF (month february)
THEN (weather cold)
7. IF (year 1993)
THEN (economy bad)
and initial facts:
54
(month february)
(year 1993)
and we're trying to prove:
(bad-mood alison)
First we check whether the goal state is in the initial facts. As it isn't there, we try matching it
against the conclusions of the rules. It matches rules 4 and 5. Let us assume that rule 4 is chosen
first - it will try to prove (overworked alison). Rule 1 can be used, and the system will try to
prove (lecturing alison) and (marking practicals alison). Trying to prove the first goal, it will
match rule 2 and try to prove (month february). This is in the set of initial facts. We still have to
prove (marking-practicals alison). Rule 3 can be used, and we have proved the original goal
(bad-mood alison).
One way of implementing this basic mechanism is to use a stack of goals still to satisfy. You
should repeatedly pop a goal of the stack, and try and prove it. If its in the set of initial facts then
its proved. If it matches a rule which has a set of preconditions then the goals in the precondition
are pushed onto the stack. Of course, this doesn't tell us what to do when there are several rules
which may be used to prove a goal. If we were using Prolog to implement this kind of algorithm
we might rely on its backtracking mechanism - it'll try one rule, and if that results in failure it
will go back and try the other. However, if we use a programming language without a built in
search procedure we need to decide explicitly what to do. One good approach is to use an
agenda, where each item on the agenda represents one alternative path in the search for a
solution. The system should try `expanding' each item on the agenda, systematically trying all
possibilities until it finds a solution (or fails to). The particular method used for selecting items
off the agenda determines the search strategy - in other words, determines how you decide on
which options to try, in what order, when solving your problem. We'll go into this in much more
detail in the section on search.
The interpreter controls the application of the rules, given the working memory, thus controlling
the system's activity. It is based on a cycle of activity sometimes known as a recognise-act cycle.
The system first checks to find all the rules whose conditions hold, given the current state of
working memory. It then selects one and performs the actions in the action part of the rule. (The
selection of a rule to fire is based on fixed strategies, known as conflict resolution strategies.) The
actions will result in a new working memory, and the cycle begins again. This cycle will be
repeated until either no rules fire, or some specified goal state is satisfied.
Rule-based systems vary greatly in their details and syntax, so the following examples are only
illustrative.
First we'll look at a very simple set of rules:
1. IF (lecturing X)
AND (marking-practicals X)
THEN ADD (overworked X)
2. IF (month february)
THEN ADD (lecturing alison)
3. IF (month february)
THEN ADD (marking-practicals alison)
4. IF (overworked X)
OR (slept-badly X)
THEN ADD (bad-mood X)
5. IF (bad-mood X)
THEN DELETE (happy X)
6. IF (lecturing X)
THEN DELETE (researching X)
Here we use capital letters to indicate variables. In other representations variables may be
indicated in different ways, such as by a ? or a ^ (e.g., ?person, ^person).
Let us assume that initially we have a working memory with the following elements:
56
(month february)
(happy alison)
(researching alison)
Our system will first go through all the rules checking which ones apply given the current
working memory. Rules 2 and 3 both apply, so the system has to choose between them, using its
conflict resolution strategies. Let us say that rule 2 is chosen. So, (lecturing alison) is added to
the working memory, which is now:
(lecturing alison)
(month february)
(happy alison)
(researching alison)
Now the cycle begins again. This time rule 3 and rule 6 have their preconditions satisfied. Lets
say rule 3 is chosen and fires, so (marking-practicals alison) is added to the working memory. On
the third cycle rule 1 fires, so, with X bound to alison, (overworked alison) is added to working
memory which is now:
(overworked alison)
(marking-practicals alison)
(lecturing alison)
(month february)
(happy alison)
(researching alison)
Now rules 4 and 6 can apply. Suppose rule 4 fires, and (bad-mood alison) is added to the working
memory. And in the next cycle rule 5 is chosen and fires, with (happy alison) removed from the
working memory. Finally, rule 6 will fire, and (researching alison) will be removed from working
memory, to leave:
(bad-mood alison)
(overworked alison)
(marking-practicals alison)
(lecturing alison)
(month february)
(This example is not meant to a reflect my attitude to lecturing!)
57
The order that rules fire may be crucial, especially when rules may result in items being deleted
from working memory. (Systems which allow items to be deleted are known as nonmonotonic).
Anyway, suppose we have the following further rule in the rule set:
IF (happy X)
THEN (gives-high-marks X)
If this rule fires BEFORE (happy alison) is removed from working memory then the system will
conclude that I'll give high marks. However, if rule 5 fires first then rule 7 will no longer apply.
Of course, if we fire rule 7 and then later remove its preconditions, then it would be nice if its
conclusions could then be automatically removed from working memory. Special systems called
truth maintenance systems have been developed to allow this. A number of conflict resolution
strategies are typically used to decide which rule to fire. These include:
Don't fire a rule twice on the same data. (We don't want to keep on adding (lecturing
alison) to working memory).
Fire rules on more recent working memory elements before older ones. This allows the
system to follow through a single chain of reasoning, rather than keeping on drawing new
conclusions from old data.
Fire rules with more specific preconditions before ones with more general preconditions.
This allows us to deal with non-standard cases. If, for example, we have a rule ``IF (bird
X) THEN ADD (flies X)'' and another rule ``IF (bird X) AND (penguin X) THEN ADD
(swims X)'' and a penguin called tweety, then we would fire the second rule first and start
to draw conclusions from the fact that tweety swims.
These strategies may help in getting reasonable behaviour from a forward chaining system, but
the most important thing is how we write the rules. They should be carefully constructed, with
the preconditions specifying as precisely as possible when different rules should fire. Otherwise
we will have little idea or control of what will happen. Sometimes special working memory
elements are used to help to control the behaviour of the system. For example, we might decide
that there are certain basic stages of processing in doing some task, and certain rules should only
be fired at a given stage - we could have a special working memory element (stage 1) and add
(stage 1) to the preconditions of all the relevant rules, removing the working memory element
when that stage was complete.
58
Searches the database for cases with similar characteristics to a new case to find and
apply appropriate solutions
59
Lecture 9
Simple Expert Systems
Expert systems:
Use an inference engine to search through the knowledge base. In forward chaining, the
inference engine begins with information entered by the user to search the knowledge
60
base for a conclusion. In backward chaining, the system begins with a hypothesis and
asks the user questions to confirm or disprove the hypothesis.
The expert systems given below are very basic. These samples should give you and idea as to
where to start your coursework. They can all be quickly and easily implemented using Crystal
Simple Expert System 1: A CAR TROUBLE DIAGNOSTIC SYSTEM
Knowledge Acquisition
The first task is knowledge acquisition. The solutions for this expert system are based wholly on
knowledge on automotive systems from the internet and a local Jua Kali mechanic.
The basic items that were identified to be needed in order to get a vehicle to start are a
combustion chamber, some sort of mechanism to turn the engine, air and fuel to burn, and
something to ignite the air fuel mixture. All the solutions in this illustration deal with how these
elements come together in order to make a vehicle start. Below is an introduction to the basic
systems that were considered.
Battery: This is the part of a vehicle that stores the power that is required to turn the engine and
create a spark.
Battery Cables: This is a set of wires that carry the power from the battery to the starter and the
rest of the engine. These cables usually fail due to corrosion, which interferes with the energy
flow from the battery.
Starter: This is a mechanical device, an electric motor that uses power from the battery to rotate
the engine.
Coil: An electronic component that takes the twelve volts coming from the battery and converts
it to a much larger voltage.
Coil Wire: A wire which caries the voltage from the coil to the distribution or computer
controlled ignition points, which then distribute the pulse to the correct spark plug wire.
Spark Plug Wires: A set of wires that caries the electronic pulse from the distributor or ignition
points to the appropriate spark plug.
61
Spark Plugs: A set of electronic components constructed of insulators and conductors. These
spark plugs create a short between a spark point and a conductor. This short creates a spark that
ignites the air fuel mixture.
Fuel: Also referred to as petrol.
Fuel Filter: A filtering device located somewhere between the fuel tank and engine. Used to
eliminate impurities from the fuel.
Knowledge Representation
The Second step is to represent the acquired knowledge. This involved coming up with rules that
would later be encoded into the knowledge base of the expert system. The hard part was deciding
at what point should problems be included in the solution space and which should be dropped. A
decision was made to limit the solution space to problems that can be fixed without any special
knowledge of how a car works. This eliminated many problems including internal engine
failures.
Below, a decision tree is used to represent the reasoning used in the system.
Starter Turning ?
NO
YES
Lights on ?
Car Moving ?
NO
YES
YES
NO
Got Enough Fuel?
Cable OK?
Coil Clicks ?
Car is Fine
YES
Terminals
Clean ?
NO
Filter YES
Replaced
Recently?
YES
Call Mechanic
YES
Charge Battery
Coil Fuse OK ?
Buy Fuel
NO
YES
NO
NO
NO
Replace Starter
Replace Filter
62
Clean Terminals
NO
Replace Fuse
Replace Coil
Malaria
+ RTI
RTI
Typhoid
Meningiti
light)
Note: RTI = Respiratory Track Infection (specifically the common cold, also referred to as the
upper respiratory infection).
Note: Cells shaded in green indicate YES for the given sign/symptom.
Rule-based Knowledge Representation
R1
R2
R3
R4
THEN [malaria]
IF patient_ill AND respiratory_track_infection AND NOT malaria THEN
[common_cold]
63
R5
R6
[malaria_and_respiratory_tract]
IF patient_ill AND NOT chills_sweat AND headache AND severe_headache
R7
THEN non_malaria
IF non_malaria AND nausea AND stiff_neck AND photophobia THEN
R8
[meningitis]
IF non_malaria AND body_malaise AND diarrhoea_constipation AND
R9
R10
R11
R12
R13
[unknown_illness_4]
IF patient_ill AND headache AND severe_headache AND NOT body_malaise
R14
64
LOBAR
CHOLERA
DYSENTR
PNEUMON
Diarrhea
Diarrhea
IA
High Fever
High Fever
Fever
Fever
Y
Diarrhea
Abdominal
Vomiting
Vomiting
Vomiting
Vomiting
Joint pains
Joint pains
Joint pains
Dehydration
TYPHOID
MENENGI
FEVER
TIS
Diarrhea
Diarrhea
High Fever
MALARIA
Dehydration
Severe
Headache
Nausea
Headache
Convulsions
pain
Fever
Blood/mucus
in stool
Abdominal
Cough
Abdominal
pain
Chest pain
Headache
pain
HEPATITS
C
Fever
Nausea
Diarrhea
Head ache
Abdominal
pain
Stool
Nausea
No appetite
Weakness
No appetite
Nausea
Rice water
analysis test
Joint pains
Vomiting
Chills
Constipation
for cysts
65
Convulsions
No appetite
Abdominal
pain
Cough
Abdominal
pain
Or rash
Stool/blood
Chills
test for S.
Typhi
Or
pneumonia
Or blood
smear
Chest pains
Fatigue
Or rashes
stool
Cold
clammy skin
Malaise
Fluid-filled
Yellowing
skin
Stiff neck
Tachycardia
Severe
Hypertensio
lungs
Swollen
headache
glands
Weakness
Soar throat
Body pain
Sunken eyes
Or asthma
Anemia
Dislike of
light
Spinal tap
test
paraparalisis
Or smoker
Blurred
Anemia
No appetite
vision
Dillusions
Tender liver
Liver
function test
Fatigue
No appetite
Rapid
Or Dark
respiration
urine
Dizziness
MISDIAGN
MISDIAGN
MISDIAGN
MISDIAGN
MISDIAGN
MISDIAGN
MISDIAGN
OSED
OSED
OSED
OSED
OSED
OSED
Plague
Flu
Dysentry
Cholera
Plague
OSED
Minor
Minor
Minor
Disease
Food
Disease
Pneumonia
poisoning
Disease
Flu
Tuberculosis
TREATME
TREATME
TREATME
TREATME
TREATME
TREATME
TREATME
NT
NT
NT
Intense
NT
NT
NT
NT
Chloroquine
Antibiotics
antibiotic
Relieve Pain
Relieve Pain
Relieve Pain
Fever
Fever
therapy
Quinine
Replace
Fluids
Metronidazo
Fever
le
Antibiotics penicillin
Erythromyci
66
n
Notes:
Minor disease: This is the catch all for a number of minor, debilitating minor illnesses other than
those listed. Usually this is no more than a head cold or bad case of the flu. Symptoms are widely
varying and are the discretion of the System User. Infection Symptoms: usually, fever, general
pain, vomiting, headaches, etc.
Misdiagnosed as: usually another minor disease, such as the incorrect flu bug. Sometimes
pneumonia.
Treatment: of pneumonia using antibiotics
From the above a decision was made to design a system where by the user will be expected to
key in the symptoms manifested and the system will compare them with those of the diseases in
the knowledge base. If they are similar then a diagnosis is arrived at otherwise the disease is
unidentified.
67
Lecture 10
PROLOG
Facts, Rules and Queries
Symbols
Prolog expressions are comprised of the following truth-functional symbols, which have the
same interpretation as in the predicate calculus.
English
Predicate Calculus
PROLOG
and
or
if
-->
:-
not
not
likes(X, susie).
likes(john, Y).
John */
likes(john, susie); likes(john,mary). /* John likes Susie or John likes Mary */
68
not(likes(john,pizza)).
Rules
A rule is a predicate expression that uses logical implication (:-) to describe a relationship among
facts. Thus a Prolog rule takes the form
left_hand_side :- right_hand_side .
This sentence is interpreted as: left_hand_side if right_hand_side. The left_hand_side is
restricted to a single, positive, literal, which means it must consist of a positive atomic
expression. It cannot be negated and it cannot contain logical connectives.
This notation is known as a Horn clause. In Horn clause logic, the left hand side of the clause is
the conclusion, and must be a single positive literal. The right hand side contains the premises.
The Horn clause calculus is equivalent to the first-order predicate calculus.
Examples of valid rules:
friends(X,Y) :- likes(X,Y),likes(Y,X).
each other */
hates(X,Y) :- not(likes(X,Y)).
/* Missing a period */
likes(X,Y),likes(Y,X) :- friends(X,Y).
not(likes(X,Y)) :- hates(X,Y).
Queries
The Prolog interpreter responds to queries about the facts and rules represented in its database.
The database is assumed to represent what is true about a particular problem domain. In making
a query you are asking Prolog whether it can prove that your query is true. If so, it answers "yes"
69
and displays any variable bindings that it made in coming up with the answer. If it fails to prove
the query true, it answers "No".
Whenever you run the Prolog interpreter, it will prompt you with ?-. For example, suppose our
database consists of the following facts about a fictitious family.
father_of(joe,paul).
father_of(joe,mary).
mother_of(jane,paul).
mother_of(jane,mary).
male(paul).
male(joe).
female(mary).
female(jane).
We get the following results when we make queries about this database
| ?- father_of(joe,paul).
true ?
yes
| ?- father_of(paul,mary).
no
| ?- father_of(X,mary).
X = joe
yes
| ?Closed World Assumption. The Prolog interpreter assumes that the database is a closed world
-- that is, if it cannot prove something is true, it assume that it is false. This is also known as
negation as failure -- that is, something is false if PROLOG cannot prove it true given the facts
and rules in its database. In this case, in may well be (in the real world), that Paul is the father of
Mary, but since this cannot be proved given the current family database, Prolog concludes that it
70
is false. So PROLOG assumes that its database contains complete knowledge of the domain it is
begin asked about.
Prolog's Proof Procedure
In responding to queries, the Prolog interpreter uses a backtracking search, similar to the one we
study in Chapter 3 of Luger. To see how this works, let's add the following rules to our database:
parent_of(X,Y) :- father_of(X,Y).
parent_of(X,Y) :- mother_of(X,Y).
/* Rule #1 */
/* Rule #2 */
And let's trace how PROLOG would process the query. Suppose the facts and rules of this
database are arranged in the order in which they were input. This trace assumes you know how
unification works.
?- parent_of(jane,mary).
parent_of(jane,mary)
2 2 Call: mother_of(jane,mary) ?
2 2 Exit: mother_of(jane,mary) ?
1 1 Exit: parent_of(jane,mary) ?
yes
{trace}
| ?Exercises
1. Add a male() rule that includes all fathers as males.
2. Add a female() rule that includes all mothers as females.
3. Add the following rules to the family database:
4.
son_of(X,Y)
5.
daughter_of(X,Y)
6.
sibling_of(X,Y)
7.
brother_of(X,Y)
8.
sister_of(X,Y)
9.
Given the addition of the sibling_of rule, and assuming the above order for the facts and
rules, show the PROLOG trace for the query sibling_of(paul,mary).
72