Uncertainty: Russell & Norvig - AIMA2e

Uncertainty
Russell & Norvig - AIMA2e

Ioan Alfred Letia
http://cs-gw.utcluj.ro/˜letia
letia@cs.utcluj.ro
Artificial Intelligence 13, 2004 – 1

Outline
Acting under uncertainty
Basic probability notation
The axioms of probability
Inference using full joint distributions
Independence
Bayes’ rule and its use
The wumpus world revisited
Summary

Uncertainty
Let action = leave for airport minutes before flight

Will get me there on time?

Problems:
1. partial observability (road state, other drivers’ plans,
)

2. noisy sensors (KCBS traffic reports)

3. uncertainty in action outcomes (flat tire, )

4. immense complexity of modelling and predicting
traffic
Hence a purely logical approach either 1) risks
falsehood: “ will get me there on time” or 2) leads to

conclusions that are too weak for decision making: “

will get me there on time if there’s no accident on the
bridge and it doesn’t rain and my tires remain intact ”

might reasonably be said to get me there on time

but I’d have to stay overnight in the airport


Handling Uncertain Knowledge
p Symptom(p,Toothache) Disease(p,Cavity)

Not all patients with toothaches have cavities
p Symptom(p,Toothache)

Disease(p,Cavity)

Disease(p,GumDisease)

Disease(p,Abcess)

We have to add an almost unlimited list of possible
causes
p Disease(p,Cavity) Symptom(p,Toothache)

Not all cavities cause pain

Probability
Probabilistic assertions summarize effects of
laziness: failure to enumerate exceptions,
qualifications, etc.
ignorance: lack of relevant facts, initial conditions,
etc.
Subjective or Bayesian probability:
Probabilities relate propositions to one’s own state of
knowledge e.g., no reported accidents

These are not claims of some probabilistic tendency in
the current situation (but might be learned from past
experience of similar situations)
Probabilities of propositions change with new evidence:
e.g., no reported accidents 5 a.m.

Analogous to logical entailment status , not truth

Making Decisions under Uncertainty
Suppose I believe the following:
gets me there on time




Which action to choose?
Depends on my preferences for missing flight vs. airport
cuisine, etc.
Utility theory is used to represent and infer preferences
Decision theory = probability theory + utility theory

Probability Basics
Begin with a set —the sample space e.g., 6 possible
!
rolls of a die. is a sample point/possible
!
#
"
world/atomic event
A probability space or probability model is a sample
space with an assignment for every s.t.

"
!
#
"

"
$

$

"

%
e.g.,

&
'

(
An event is any subset of
!

"
)
+,
*
%
E.g., die roll

(

(

(

(&
.
.
-

Random Variables
A random variable is a function from sample points to
some range, e.g., the reals or Booleans, e.g.,
.

0/
0
213
4
induces a probability distribution for any r.v. :

5

"
5
67
)
9:
<=% ;
,
8%
>?
e.g.,

4
0/
0

(

(

(

(&
13

Propositions
Think of a proposition as the event (set of sample
points) where the proposition is true
Given Boolean random variables and :

event = set of sample points where

"
213
@
4
event = set of sample points where

"
B
@EDC
@
4
A
event = points where and

"

"
G

213
213
F
@
4
Often in AI applications, the sample points are defined
by the values of a set of random variables, i.e., the
sample space is the Cartesian product of the ranges of
the variables
With Boolean variables, sample point = propositional
logic model e.g., , , or B .
@EDC
G

13
F
4
@

A
Proposition = disjunction of atomic events in which it is
true, e.g.,

HIG
G
G
G

F
@
@
A
A

G

G

G

G

F
.
@
A

Why Use Probability?
True
A A B B
>
The definitions imply that certain logically related events
must have related probabilities
E.g.,

G

@

J G
K

G

F
.
@
@
de Finetti (1931): an agent who bets according to
probabilities that violate these axioms can be forced to
bet so as to lose money regardless of outcome
Syntax for Propositions
Propositional or Boolean random variables
e.g., (do I have a cavity?)
L
O2 N
@M
Discrete random variables (finite or infinite)
e.g., is one of
D3 Q
UV
P
SET3C
0
@N
@
4
41
R
RO
1
R
O
D
RT
is a proposition
P
@N
@
4
41
1
R
Values must be exhaustive and mutually exclusive
Continuous random variables (bounded or unbounded)
e.g., ; also allow, e.g., .
W
W
&

-&
&

4X
Y
4X
Y

Arbitrary Boolean combinations of basic propositions

Prior Probability
Prior or unconditional probabilities of propositions, e.g.,

and
4

L
&

O N
13
@
@M
41
D3
R
RO
correspond to belief prior to arrival of new evidence
Probability distribution gives values for all possible
assignments: P

V
P
&

Z

@
4
41

(normalized, i.e., sums to 1)
Joint probability distribution for a set of r.v.s gives the
probability of every atomic event on those r.v.s (i.e.,
every sample point) P =a matrix:

L
P
\[
&
O N
@
4
41
@M

P
SET3C
0
@N
@
4
41
D3
R
RO
1
O
D
RT
U
L

&

&
O2 N
213
@M
4
L
B
@EDC

Z

Z
O2 N
@M
Every question about a domain can be answered by the

joint distribution because every event is a sum of sample
points Artificial Intelligence 13, 2004 – 12
Probability for Continuous Variables
0.125
18 dx 26
Express distribution as a parameterized function of

value:
= uniform density between 18

6
]^
_

6
5
Z
&

and 26
Here is a density; integrates to 1.

really means

5
&

&

6
0(
> d cba `
.0
$5
&

&

&

$
6
e

Gaussian Density
0
i l
j :
k;
lm
hg f

6
>
j
4
i

Conditional Probability
Conditional or posterior probabilities
e.g.,

4
P
P
Z
O N
T

S
@M
@
S
i.e., given that is all I know
P
P
T

T
@
S
4
NOT “if then 80% chance of ”
P
P
O N
T

T
@
S
4
S
@M
Notation for conditional distributions:
P = 2-element vector of 2-element

4
L
P
P
TW
O2 N

@M
@
vectors S
If we know more, e.g., is also given, then we have
O2 N
S
@M

P
P

O N
2O N
T

S
@M
@
S
4
S
@M
Note: the less specific belief remains valid after more

evidence arrives, but is not always useful
New evidence may be irrelevant, allowing simplification,
e.g.,

R

4
P
P
P
P

Z
O N
O N
T

T

S
@M
@
S
4
41
D
S
@M
@
S
This kind of inference, sanctioned by domain
knowledge, is crucial Artificial Intelligence 13, 2004 – 15
Conditional Probability 2
Definition of conditional probability:

G
F
@
if

@
G

Gn

G
Product rule gives an alternative formulation:

G

@
G

G
Y
G
@

@
F
@
A general version holds for whole distributions, e.g.,

P P P

L
L
P
P
O N
O2 N
O N
@
@
4
41
@M
41
@M
@M

(View as a set of equations, not matrix mult.)

[
&
Chain rule is derived by successive application of

product rule:
P P P =

5
op5 op5
op5
p
o 5
o 5 5
o 5
j
j

5

5

P P P

5
oq 5
o 5
o 5
p
j
j
j

=

o
= P

7 5
5
7 5
j
<7


Inference by Enumeration
toothache toothache
L
catch catch catch catch
L
cavity .108 .012 .072 .008
cavity .016 .064 .144 .576
L
Start with the joint distribution

For any proposition , sum the atomic events where it is
r
true:

r

"
<% s
t
8%

toothache toothache
L
L
cavity .108 .012 .072 .008
cavity .016 .064 .144 .576
L

r
true:

r

"
<% s
t
8%

4
P
P

Z

&

&
T

.
T
@
S

toothache toothache
L
L
cavity .108 .012 .072 .008
cavity .016 .064 .144 .576
L

r
true:

r

"
<% s
t
8%

4
P
P
O N
T

S
@M
@
S
&

Z

&

Z

&
Z
.

toothache toothache
L
L
cavity .108 .012 .072 .008
cavity .016 .064 .144 .576
L
Can also compute conditional probabilities

4
P
P
2O N
T

F
S
@M
@
S
A

4
P
P
O N
T

S
@M
@
S

A

4
P
P
T

T
@
S

.

Z

&

.
.


Normalization
toothache toothache
L
L
cavity .108 .012 .072 .008
cavity .016 .064 .144 .576
L
Denominator can be viewed as normalization constant

P P

4
4
L
L
P
P
P
P
O2 N
O N
T

T

@M
@
S
@M
@
S

P P
^

P

L
L
P
P
P
P
O N
2O N
T

@
T

@
.

@M
@
S
4
S
S
@M
@
S
4
S
S

A

^Q
V
_

Z

&

.

Q
V
V

&

Z

General idea: compute distribution on query variable by

fixing evidence variables and summing over hidden
variables Artificial Intelligence 13, 2004 – 18
Inference by Enumeration 2
Typically, we are interested in the posterior joint
distribution of the query variables Y given specific values
e for the evidence variables E
Let the hidden variables be H X Y E
J
Then the required summation of joint entries is done by
summing out the hidden variables:
P YE e P Y E e P Y E e H h

The terms in the summation are joint entries because Y,
E, and H together exhaust the set of random variables
Obvious problems:
1. Worst-case time complexity where is the
0/

o
0
largest arity
2. Space complexity to store the joint distribution
0/

o
3. How to find the numbers for entries???

0/

o

Independence
Cavity
Cavity decomposes into Toothache Catch
Toothache Catch
Weather
Weather
and are independent iff

P P or P P or P P P

P

L
P @M L
P
P
@ P
P
TW
L 2O N

@
@
T
@
S
4
L S
4
4 1
NO
P P

P
P
P
TW

@
T
@
S
4
@M
41

32 entries reduced to 12; for independent biased

R
coins,
o
&
u
R
Absolute independence powerful but rare

Dentistry is a large field with hundreds of variables,
none of which are independent. What to do? Artificial Intelligence 13, 2004 – 20
Conditional Independence
has = 7 independent
J v
P

P
L
L
P
P
TW
&

O N

@
T
@
S
4
@M
S

entries
If I have a cavity, the probability that the probe catches
in it doesn’t depend on whether I have a toothache:
(1)

P

P

P
P
O N
O2 N
@
T

@
S
S
@
S
4
S
@M
S
S
S
@M
The same independence holds if I haven’t got a cavity:
(2)

P

P

P
P
2O N
O N
@
T

@
S
S
@
S
4
S
@M
S
S
S
@M
A
A
is conditionally independent of given
L
P
P
TW
@

S
@
S
4
:
L
2O N
@M
P P

P

P

L
L
P
P
TW
O N
O N
@
@
S
@
S
4
@M
@M
Equivalent statements:
P P

4
S
4

L @L
L
P
P
P P
P
P
TW
TW
N
2O N

@ O O

T
@
S
@M
@
S
@M
P

L
P
P
TW
L N

@
T
@
S
4 4
N2O S
@M
P P

P

L
L
P
P
TW
O N

T
@
S
M@
@M

Conditional Independence 2
Write out full joint distribution using chain rule:
P

4 L
L
P
P
@ S P
TW
L N

@
2O
T
@
S
4
@M
S
P P

L
L
P
P
P
TW
O2 N
O N

@
T
@
S
@M
@M

P
P P P

4

P

L
L
P
P
TW
2O N
O N
O N

@
@
T
@
S
@M
@M
@M

P P P

4

P

L
L
P
P
TW
O N
O2 N
O2 N

@
T
@
S
@M
@M
@M

I.e., 2 + 2 + 1 = 5 independent numbers (equations 1

and 2 remove 2)
In most cases, the use of conditional independence
reduces the size of the representation of the joint
distribution from exponential in to linear in .
R
Conditional independence is our most basic and robust
form of knowledge about uncertain environments.

Bayes’ Rule
Product rule

G

@
G

G

G
@

@
F
@

G
@

@
Bayes’ rule

@
G

G
or in distribution form
P P

5w

w

P P P
w

5

5w

w

P

5
Useful for assessing diagnostic probability from causal
probability:
x
4

4
B
B
3L
3L
S
4
@
D
@
D

4 x

3L
B
B
S
@
D
x

B
B
S
4
E.g., let be meningitis, be stiff neck:
y
D
X

X

Z

[
X
D

Z

D

Note: posterior probability of meningitis still very small!
Bayes’ Rule - Cond. Independence
Cavity Cause
Toothache Catch Effect 1 Effect n
P

P
L
P
P
O N
T

@
F
@M
@
S
4
S
S
P P
T
P

L
L
P
P
O2 N
O N

@
F

@
S
4
S
S
@M
@M

P P P
T
4

P

L
L
P
P
2O N
2O N
O2 N

@

@
S
@M
S
S
@M
@M

This is an example of a naive Bayes model:
P P P

4
x
4
3L
B
B
B
B
3L
B
B
3L
4x
x
IS
o S
7S
@
D
@
D
@
D
7

Total number of parameters is linear in R

Wumpus World
1,4 2,4 3,4 4,4
1,3 2,3 3,3 4,3
1,2 2,2 3,2 4,2

B
OK
1,1 2,1 3,1 4,1
B
OK OK
iff [ ] contains a pit

7z
N
{
13
4

iff [ ] is breezy

N
{
13
4

7z
Include only in the probability model

|
}

|
}

|


Specifying the Probability Model
The full joint distribution is P

|
|

}

|}

|}

|

Apply product rule:
P P

|
|

|
|

|
}

|
}

|

}

Do it this way to get
x
4
B
B
3L
S
4
@
D
First term: 1 if pits are adjacent to breezes, 0 otherwise
Second term: pits are placed randomly, probability 0.2
per square:
|
~
P P

jo
|
|

| 7z
&
Z
[
}

| 7z
|
}

<
for pits
R

Observations and Query
We know the following facts:
G
R |} G
| Y |} G
A | G
F
F
U A

F
TR
A
}

|} Y

| Y

Query is P
v
G

|
RT
U
R
}
Define = s other than and

]
7z
|

R
RT
U
R
RT
U
R
}
v
For inference by enumeration, we have
P P

v
G
G

|
|
RT
U
R
v3
R
RT
U
R
RT
U
R
}
}
o
o
Grows exponentially with number of squares! o

Using Conditional Independence
1,4 2,4 3,4 4,4
1,3 2,3 3,3 4,3

OTHER
QUERY
1,2 2,2 3,2 4,2
1,1 2,1 FRINGE

3,1 4,1
KNOWN
Basic insight: observations are conditionally

independent of other hidden squares given
neighbouring hidden squares
Define
/

P
]
1
N

R
RT
U
R
R
4
41
P P

G
R

G
4

|}
|}
1
N
RT
U
R
R
RT
U
RT
U
R
R
v
v
Manipulate query into a form where we can use this!
P
< < < < < < < q :
s
o
P
? ? ? ?
q : :
o | o

P
; o :s

P < ;
? ; : s :
o
P
o
P
P
q : q s
:s :s
P
| o | o |
q o o
:s ; q o
o ? | | o | o
|
P
| o 7 q q | o q :
P
7
q o o | | o |
:s
| o ; ; 7 7 o o
o o
7
| o
o o
o |
P
; ;
q
P
P
P
q : q : q : | o
; | q :
| o
: |
;
|
P
7 ;
7
o o o q : o
o : | o
o | o |
; ; | o | o ;
: o o
7 7 o
7 ; o o
o | o o
: | |
; 7
o
7 o

o ;
;
;
Using Conditional Independence 2
|
;
: :

;
; ;
Using Conditional Independence 3
1,3 1,3 1,3 1,3 1,3
1,2 2,2 1,2 2,2 1,2 2,2 1,2 2,2 1,2 2,2
B B B B B
OK OK OK OK OK
1,1 2,1 3,1 1,1 2,1 3,1 1,1 2,1 3,1 1,1 2,1 3,1 1,1 2,1 3,1
B B B B B
OK OK OK OK OK OK OK OK OK OK
0.2 x 0.2 = 0.04 0.2 x 0.8 = 0.16 0.8 x 0.2 = 0.16 0.2 x 0.2 = 0.04 0.2 x 0.8 = 0.16

P

v
G

V

|

&

Z

.
.
RT
U
R

}

Q
V
'

¡
P

G
V

|
Z

¡
RT
U
R


Summary
Basic probability statements include prior probabilities
and conditional probabilities over simple and complex
propositions
The full joint probability distribution specifies the
probability of each complete assignment of values to
random variables
Queries answered by summing over atomic events
Absolute independence between subsets of random
variables might allow the full joint distribution to be
factored into smaller joint distributions
Bayes’ rule allows unknown probabilities to be
computed from known conditional probabilities, usually
in the causal direction
Conditional independence brought about by the direct
causal relationships in the domain might allow the full
joint distribution to be factored into smaller, conditional
distributions

Uncertainty: Russell & Norvig - AIMA2e

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Uncertainty: Russell & Norvig - AIMA2e

Hochgeladen von

Copyright:

Verfügbare Formate

Uncertainty

Russell & Norvig - AIMA2e

Artificial Intelligence 13, 2004 – 1

Artificial Intelligence 13, 2004 – 2

2. noisy sensors (KCBS traffic reports)

conclusions that are too weak for decision making: “

but I’d have to stay overnight in the airport

Artificial Intelligence 13, 2004 – 4

Analogous to logical entailment status , not truth

gets me there on time

Artificial Intelligence 13, 2004 – 6

E.g., die roll

Artificial Intelligence 13, 2004 – 7

Artificial Intelligence 13, 2004 – 8

event = points where and

Artificial Intelligence 13, 2004 – 9

Artificial Intelligence 13, 2004 – 11

Every question about a domain can be answered by the

Express distribution as a parameterized function of

Artificial Intelligence 13, 2004 – 13

Artificial Intelligence 13, 2004 – 14

Note: the less specific belief remains valid after more

A general version holds for whole distributions, e.g.,

(View as a set of equations, not matrix mult.)

Chain rule is derived by successive application of

Artificial Intelligence 13, 2004 – 16

Start with the joint distribution

Artificial Intelligence 13, 2004 – 17

For any proposition , sum the atomic events where it is

Artificial Intelligence 13, 2004 – 17

For any proposition , sum the atomic events where it is

Artificial Intelligence 13, 2004 – 17

Can also compute conditional probabilities

Artificial Intelligence 13, 2004 – 17

Denominator can be viewed as normalization constant

General idea: compute distribution on query variable by

3. How to find the numbers for entries???

Artificial Intelligence 13, 2004 – 19

and are independent iff

32 entries reduced to 12; for independent biased

Absolute independence powerful but rare

Artificial Intelligence 13, 2004 – 21

I.e., 2 + 2 + 1 = 5 independent numbers (equations 1

Artificial Intelligence 13, 2004 – 22

Toothache Catch Effect 1 Effect n

This is an example of a naive Bayes model:

Total number of parameters is linear in R

1,3 2,3 3,3 4,3

1,2 2,2 3,2 4,2

iff [ ] contains a pit

Include only in the probability model

Artificial Intelligence 13, 2004 – 25

Artificial Intelligence 13, 2004 – 26

Grows exponentially with number of squares! o

Artificial Intelligence 13, 2004 – 27

1,3 2,3 3,3 4,3

1,2 2,2 3,2 4,2

1,1 2,1 FRINGE

Basic insight: observations are conditionally

| o  7 q  q  | o q  :

Artificial Intelligence 13, 2004 – 30

Das könnte Ihnen auch gefallen

Grows exponentially with number of squares! o

| o 7 q q | o q :