Sie sind auf Seite 1von 34

Uncertainty

Russell & Norvig - AIMA2e


Ioan Alfred Letia
http://cs-gw.utcluj.ro/˜letia
letia@cs.utcluj.ro

Artificial Intelligence 13, 2004 – 1


Outline
Acting under uncertainty
Basic probability notation
The axioms of probability
Inference using full joint distributions
Independence
Bayes’ rule and its use
The wumpus world revisited
Summary

Artificial Intelligence 13, 2004 – 2


Uncertainty
Let action = leave for airport minutes before flight



Will get me there on time?


Problems:
1. partial observability (road state, other drivers’ plans,
)




2. noisy sensors (KCBS traffic reports)


3. uncertainty in action outcomes (flat tire, )




4. immense complexity of modelling and predicting
traffic
Hence a purely logical approach either 1) risks
falsehood: “ will get me there on time” or 2) leads to


conclusions that are too weak for decision making: “


will get me there on time if there’s no accident on the
bridge and it doesn’t rain and my tires remain intact ”




might reasonably be said to get me there on time


but I’d have to stay overnight in the airport



Artificial Intelligence 13, 2004 – 3



Handling Uncertain Knowledge
p Symptom(p,Toothache) Disease(p,Cavity)


Not all patients with toothaches have cavities
p Symptom(p,Toothache)


Disease(p,Cavity)


Disease(p,GumDisease)


Disease(p,Abcess)




We have to add an almost unlimited list of possible
causes
p Disease(p,Cavity) Symptom(p,Toothache)


Not all cavities cause pain

Artificial Intelligence 13, 2004 – 4


Probability
Probabilistic assertions summarize effects of
laziness: failure to enumerate exceptions,
qualifications, etc.
ignorance: lack of relevant facts, initial conditions,
etc.
Subjective or Bayesian probability:
Probabilities relate propositions to one’s own state of
knowledge e.g.,  no reported accidents




 


These are not claims of some probabilistic tendency in
the current situation (but might be learned from past
experience of similar situations)
Probabilities of propositions change with new evidence:
e.g., no reported accidents 5 a.m.




 



Analogous to logical entailment status , not truth






 Artificial Intelligence 13, 2004 – 5
Making Decisions under Uncertainty
Suppose I believe the following:

gets me there on time





 


 






 gets me there on time


 


 






gets me there on time



 

 






gets me there on time



 

 




Which action to choose?
Depends on my preferences for missing flight vs. airport
cuisine, etc.
Utility theory is used to represent and infer preferences
Decision theory = probability theory + utility theory

Artificial Intelligence 13, 2004 – 6


Probability Basics
Begin with a set —the sample space e.g., 6 possible

!
rolls of a die. is a sample point/possible

!
#
"
world/atomic event
A probability space or probability model is a sample
space with an assignment for every s.t.


"

!
#
"

"
$


$

" 


%

e.g.,

 

 

 


 

 

 
 &

 '

 

 


(
An event is any subset of
!




"
)

+,
*
%

E.g., die roll




 


(


(


(


(&
.

.
-

Artificial Intelligence 13, 2004 – 7


Random Variables
A random variable is a function from sample points to
some range, e.g., the reals or Booleans, e.g.,
.

 
0/
0

213
4
induces a probability distribution for any r.v. :


5





"
5

67

)
9:
<=% ;

,
8%

>?
e.g.,


4 
0/
 0


(


(


(


(&
13 

Artificial Intelligence 13, 2004 – 8


Propositions
Think of a proposition as the event (set of sample
points) where the proposition is true
Given Boolean random variables and :


event = set of sample points where


" 

213
@

4
event = set of sample points where


" 

B
@EDC
@

4
A

event = points where and


" 


" 
G


213

213
F
@

4
Often in AI applications, the sample points are defined
by the values of a set of random variables, i.e., the
sample space is the Cartesian product of the ranges of
the variables
With Boolean variables, sample point = propositional
logic model e.g., , , or B .
@EDC

G

13 

F
4

@


A
Proposition = disjunction of atomic events in which it is
true, e.g.,


HIG 

G

G

G


F
@

@
A

A


G 



G



G



G


F
.

@
A

Artificial Intelligence 13, 2004 – 9


Why Use Probability?
True

A A B B

>
The definitions imply that certain logically related events
must have related probabilities
E.g.,


G 


@


J G
K



G


F
.
@

@
de Finetti (1931): an agent who bets according to
probabilities that violate these axioms can be forced to
bet so as to lose money regardless of outcome
Artificial Intelligence 13, 2004 – 10
Syntax for Propositions
Propositional or Boolean random variables
e.g., (do I have a cavity?)
L

O2 N
@M
Discrete random variables (finite or infinite)
e.g., is one of

D3 Q

UV
P

SET3C
0
@N
@
4

41

R
RO
 1

 R

O
 D
RT
is a proposition
P

@N
@
4

 41
1

R
Values must be exhaustive and mutually exclusive
Continuous random variables (bounded or unbounded)
e.g., ; also allow, e.g., .
W

W
&
 

-&
&

4X
Y

4X
Y


Arbitrary Boolean combinations of basic propositions

Artificial Intelligence 13, 2004 – 11


Prior Probability
Prior or unconditional probabilities of propositions, e.g.,

and

4 




L

&
 


 
O  N

13 

@
@M

 41
D3

R
RO
correspond to belief prior to arrival of new evidence
Probability distribution gives values for all possible
assignments: P



V
P

&
 

 
 
 
Z

 
@
4

41


(normalized, i.e., sums to 1)
Joint probability distribution for a set of r.v.s gives the
probability of every atomic event on those r.v.s (i.e.,
every sample point) P =a matrix:



L
P

\[
&
O N
@
4

41

@M

P

SET3C
0
@N
@
4

 41

D3

R
RO
1

O
D
RT
U
L



 


 
&

 



 
&
O2 N

213
@M

4
L

B
@EDC




 

 
Z

 


 
Z
O2 N
@M

Every question about a domain can be answered by the


joint distribution because every event is a sum of sample
points Artificial Intelligence 13, 2004 – 12
Probability for Continuous Variables

0.125

18 dx 26

Express distribution as a parameterized function of


value:
= uniform density between 18


6 

]^

_

6
5

Z
 &


and 26
Here is a density; integrates to 1.


really means


 
5
&
 

 
&


6
0(
> d cba `

.0
$5
 &
 

&
 

 
&

$

6
e

Artificial Intelligence 13, 2004 – 13


Gaussian Density

0
i l
j :

k;
lm
hg f



6 

>
j
4
i

Artificial Intelligence 13, 2004 – 14


Conditional Probability
Conditional or posterior probabilities
e.g.,


4 
P
P

 Z
O N
T

S
@M

@
S
i.e., given that is all I know

P
P
T

T

@
S
4
NOT “if then 80% chance of ”

P
P

O N
T

T

@
S
4

S
@M
Notation for conditional distributions:
P = 2-element vector of 2-element


4
L

P
P
TW
O2 N


@M

@
vectors S
If we know more, e.g., is also given, then we have

O2 N
S
@M


 
P
P


O N

2O N
T

S
@M

@
S
 4
S
@M

Note: the less specific belief remains valid after more


evidence arrives, but is not always useful
New evidence may be irrelevant, allowing simplification,
e.g.,


R 



4 
P
P

P
P


 Z
O N

O N
T


T

S
@M

@
S
 4

41
D

S
@M

@
S
This kind of inference, sanctioned by domain
knowledge, is crucial Artificial Intelligence 13, 2004 – 15
Conditional Probability 2
Definition of conditional probability:



G
F
@
if


@
 G


 Gn 



G
Product rule gives an alternative formulation:


 G


@
G

G 

Y
G
@

@
F
@

A general version holds for whole distributions, e.g.,


P P P


 





L

L
P

P
O N

O2 N

O N
@

@
4

41

@M

41

@M

@M


(View as a set of equations, not matrix mult.)


[
&

Chain rule is derived by successive application of


product rule:
P P P =









5

op5 op5

op5

 p
o 5

o 5  5

o 5
j

 j






 
5 


 
5 

P P P






5

oq 5

o 5

o 5
p
j

j

j
















=



o

= P




7 5
5

7 5
j
<7







Artificial Intelligence 13, 2004 – 16


Inference by Enumeration
toothache toothache

L
catch catch catch catch

L
cavity .108 .012 .072 .008
cavity .016 .064 .144 .576
L

Start with the joint distribution


For any proposition , sum the atomic events where it is
r
true:

r 


"
<% s
t
8%

Artificial Intelligence 13, 2004 – 17


Inference by Enumeration
toothache toothache

L
catch catch catch catch

L
cavity .108 .012 .072 .008
cavity .016 .064 .144 .576
L

For any proposition , sum the atomic events where it is


r
true:

r 


"
<% s
t
8%


4 
P
P


 

Z

 

&

 



 


 &
T


.
T

@
S

Artificial Intelligence 13, 2004 – 17


Inference by Enumeration
toothache toothache

L
catch catch catch catch

L
cavity .108 .012 .072 .008
cavity .016 .064 .144 .576
L

For any proposition , sum the atomic events where it is


r
true:

r 


"
<% s
t
8%


4 
P
P
O N

T


S
@M

@
S
&


 

Z

 

&

 


 

Z

 



 


 &
Z
.

Artificial Intelligence 13, 2004 – 17


Inference by Enumeration
toothache toothache

L
catch catch catch catch

L
cavity .108 .012 .072 .008
cavity .016 .064 .144 .576
L

Can also compute conditional probabilities



4
P
P
2O N

T

F
S
@M

@
S
A


4
P
P
O N
T

S
@M

@
S


A



4
P
P
T

T

@
S


 



 

.



   
 
Z

 

&

 



 

.

.


Artificial Intelligence 13, 2004 – 17


Normalization
toothache toothache

L
catch catch catch catch

L
cavity .108 .012 .072 .008
cavity .016 .064 .144 .576
L

Denominator can be viewed as normalization constant


P P


4 

4
L

L
P
P

P
P
O2 N

O N
T


T

@M

@
S

@M

@
S

P P
^


P


L

L
P
P

P
P
O N

2O N
T


@

T


@
.


@M

@
S
 4
S
S

@M

@
S
 4
S
S


A



^Q

V
_
 

Z

 



 

&

 

.




Q

 V

V
 
&

 
Z

 

 



General idea: compute distribution on query variable by


fixing evidence variables and summing over hidden
variables Artificial Intelligence 13, 2004 – 18
Inference by Enumeration 2
Typically, we are interested in the posterior joint
distribution of the query variables Y given specific values
e for the evidence variables E
Let the hidden variables be H X Y E

J
Then the required summation of joint entries is done by
summing out the hidden variables:

P YE e P Y E e P Y E e H h

















The terms in the summation are joint entries because Y,
E, and H together exhaust the set of random variables
Obvious problems:
1. Worst-case time complexity where is the
0/ 

o

0
largest arity
2. Space complexity to store the joint distribution
0/ 

o

3. How to find the numbers for entries???


0/ 

o

Artificial Intelligence 13, 2004 – 19


Independence
Cavity
Cavity decomposes into Toothache Catch
Toothache Catch
Weather
Weather

and are independent iff




P P or P P or P P P


 






 




 





 
P



L

 P @M L
P
P

@  P

P
TW

L 2O N


@

 @
T

@
S
 4

L S

 4

4 1
NO  
P P



P
P

P
TW


@
T

@
S
 4

@M

41


32 entries reduced to 12; for independent biased


R
coins,
o
&

u
R

Absolute independence powerful but rare


Dentistry is a large field with hundreds of variables,
none of which are independent. What to do? Artificial Intelligence 13, 2004 – 20
Conditional Independence
has = 7 independent

J v
P


P
L

L
P
P
TW

&


O N


@
T

@
S
 4
@M

S

entries
If I have a cavity, the probability that the probe catches
in it doesn’t depend on whether I have a toothache:
(1)


P

 



P


P
P

O N

O2 N
@

T


@
S
S

@
S
 4
S
@M

S
S
S
@M
The same independence holds if I haven’t got a cavity:
(2)


P

 



P


P
P

2O N

O N
@

T


@
S
S

@
S
 4
S
@M

S
S

S
@M
A

A
is conditionally independent of given
L

P
P
TW
@


S

@
S
4
:
L

2O N
@M

P P


P

 

P


L

L
P
P
TW

O N

O N
@

@
S

@
S
 4
@M

@M
Equivalent statements:
P P


4

S    

4


L @L

L
P
P

P  P

P
P
TW

TW
N

2O N


@ O O


T

@
S

@M

@
S

@M
P


L
P
P
TW

L N


@
T

@
S
4  4

N2O S

 @M

P P


P


L

L
P
P
TW

O N

T

@
S

M@

@M

Artificial Intelligence 13, 2004 – 21


Conditional Independence 2
Write out full joint distribution using chain rule:
P



4 L

L
P
P

@ S P
TW

L N


@

2O
T

@
S
 4

@M
S 
P P






L

L
P
P

 P
TW

O2 N

O N


@
T

@
S

@M

@M


 P 
P P P


4




P





L

L
P
P
TW

2O N

O N

O N


@

@
T

@
S

@M

@M

@M


P P P


4




P





L

L
P
P
TW

O N

O2 N

O2 N


@
T

@
S

@M

@M

@M


I.e., 2 + 2 + 1 = 5 independent numbers (equations 1


and 2 remove 2)
In most cases, the use of conditional independence
reduces the size of the representation of the joint
distribution from exponential in to linear in .

R
Conditional independence is our most basic and robust
form of knowledge about uncertain environments.

Artificial Intelligence 13, 2004 – 22


Bayes’ Rule
Product rule



 G


@
G

 G


G
@

@
F
@


G
@

@
Bayes’ rule


@
 G


G
or in distribution form
P P


5w 

w

P P P
w

 5


5w 

w



P


5
Useful for assessing diagnostic probability from causal
probability:

x 

4


4
B
B

3L

3L
S
4

@
D

@
D


4 x


3L

B
B
S
@
D

x 


B
B
S
4
E.g., let be meningitis, be stiff neck:
y
D 
X

X 


 Z

 




[
X 
D
 

 


Z



D 


 

Note: posterior probability of meningitis still very small!
Artificial Intelligence 13, 2004 – 23
Bayes’ Rule - Cond. Independence
Cavity Cause

Toothache Catch Effect 1 Effect n

P


P
L

P
P
O N
T


@
F
@M

@
S
4
S
S
P P
T 

P





L

L
P
P

O2 N

O N


@
F


@
S
4
S
S

@M

@M


P P P
T 

4




P





L

L
P
P

2O N

2O N

O2 N


@


@
S

@M

S
S

@M

@M


This is an example of a naive Bayes model:

P P P




4

x

4
3L

B
B

B
B

3L

B
B

3L
 4x

 x
IS 

o S

7S 
@
D

@
D

@
D
7





Total number of parameters is linear in R


Artificial Intelligence 13, 2004 – 24
Wumpus World
1,4 2,4 3,4 4,4

1,3 2,3 3,3 4,3

1,2 2,2 3,2 4,2


B
OK
1,1 2,1 3,1 4,1
B
OK OK

iff [ ] contains a pit


7z 

 N
{
13 
4


iff [ ] is breezy


 N
{
13 
4

7z

Include only in the probability model




 

 
|
}

|
}


| 

Artificial Intelligence 13, 2004 – 25


Specifying the Probability Model
The full joint distribution is P


| 

|




 

 
}

|}

|}


| 






Apply product rule:
P P








 

 

| 

|


| 

|

|
}

|
}


| 

}












Do it this way to get

x 

4
B
B

3L
S
4

@
D
First term: 1 if pits are adjacent to breezes, 0 otherwise
Second term: pits are placed randomly, probability 0.2
per square:
|

~
P P





 

jo
| 

|


| 7z 

 &

 Z
[
}

| 7z

|
}







<

for pits
R

Artificial Intelligence 13, 2004 – 26


Observations and Query
We know the following facts:
 G

 R |} G

| Y |} G

A |  G
F

F
U A




F
TR

A
}

|} Y


| Y

Query is P 

v 

G

| 

RT
U
 R
}
Define = s other than and

]

7z 

| 


R
RT
U
R

RT
U
R
}
v
For inference by enumeration, we have

P P


v 

 G

G



| 

| 
RT
U
 R

 v3
R
RT
U
 R
RT
U
 R
}

}
‚o 
€o

Grows exponentially with number of squares! ƒo

Artificial Intelligence 13, 2004 – 27


Using Conditional Independence
1,4 2,4 3,4 4,4

1,3 2,3 3,3 4,3


OTHER
QUERY

1,2 2,2 3,2 4,2

1,1 2,1 FRINGE


3,1 4,1
KNOWN

Basic insight: observations are conditionally


independent of other hidden squares given
neighbouring hidden squares
Define
/


P
]


N

†
R
RT
U
R


4

41
P P

G

R 


G

4

|} 

|} 


N
RT
U
 R
R
RT
U

RT
U
 R


v

v
Manipulate query into a form where we can use this!
Artificial Intelligence 13, 2004 – 28
P
< < < < < < < ‰ˆq ‡ :
Šs
ŒŸ Œ‡ “”’ Œ ’“ Œ “”’ Œ “”’ Œ Ž Œ o‚ 

P
Ž? Ž? Ž? Ž? Ž
‰ˆq ‡ : : 
‚o  •– •– •– •– | ƒo
‘Ž

P
“”’ Š ; ƒo ‹ :s
—
“ ˜–
—
“ ˜– P < ‹;
Ž? ; ‹: s :
‚o 

P
o‚  ‹

P
P
•– ‰ˆq ‡ : ˆ‰q ‡ s Ž Œ
‹ :s ‹ :s

P
| oƒ | ƒo |Š Ž
ˆ‰q ‡ o‚  o‚  
‹ :s “”’ Š ; ˆq‡ o‚  Ž‘
o‚  Ž? |Š | ƒo | ƒo

P

| ƒo •– ›š7™ ˆ‰q ‡ ˆ‰q ‡ | ƒo ˆ‰q ‡ :

P
š 7™
ˆ‰q ‡ oœ oœ |Š |Š €o |Š
‹ :s
|Š o‚  ; ; ›š7™ ›š7™ o‚  €o
— oœ oœ
›š7™
| ƒo
“ — “ ˜– ƒo o‚ 
oœ ˜– | ‚
P

; ;
ˆ‰q ‡
P
P

P
ˆ‰q ‡ : ˆ q ‡:  ˆ‰q ‡ : | ƒo
 ‡; |Š ‰ˆq ‡ :
|Š š ž ‚o 
: |Š
;

P

›š7™ Š ‡;
š›7 ™
oœ o‚  o‚  ˆ‰q ‡ : o‚ 
oœ : | ƒo
o‚  | ƒo |Š
;  ‡; | ƒo | ƒo ‹;
: ƒo o‚ 
š›7 ™ š›7 ™ €o
š›7 ™ ‡; oœ oœ
oœ | oƒ o‚ 
: | ‚ | ‚
; š›7 ™
oœ  
›š7™ ƒo
—
“ ˜– oœ ;
 ‡; š ž š ž
;
Using Conditional Independence 2

| ‚
‡ ;
‚: ‚: 
 
š ž
š ž š ž ;
; ;
Artificial Intelligence 13, 2004 – 29
Using Conditional Independence 3
1,3 1,3 1,3 1,3 1,3

1,2 2,2 1,2 2,2 1,2 2,2 1,2 2,2 1,2 2,2
B B B B B

OK OK OK OK OK
1,1 2,1 3,1 1,1 2,1 3,1 1,1 2,1 3,1 1,1 2,1 3,1 1,1 2,1 3,1
B B B B B

OK OK OK OK OK OK OK OK OK OK

0.2 x 0.2 = 0.04 0.2 x 0.8 = 0.16 0.8 x 0.2 = 0.16 0.2 x 0.2 = 0.04 0.2 x 0.8 = 0.16

  
P


v 

G

 

V

| 


 & 




 


 


 Z

 


 

.

.
RT
U
 R

}


Q

V
 '
 
 
¡

P


G

V

|  

 Z


 

¡
RT
U
 R


Artificial Intelligence 13, 2004 – 30


Summary
Basic probability statements include prior probabilities
and conditional probabilities over simple and complex
propositions
The full joint probability distribution specifies the
probability of each complete assignment of values to
random variables
Queries answered by summing over atomic events
Absolute independence between subsets of random
variables might allow the full joint distribution to be
factored into smaller joint distributions
Bayes’ rule allows unknown probabilities to be
computed from known conditional probabilities, usually
in the causal direction
Conditional independence brought about by the direct
causal relationships in the domain might allow the full
joint distribution to be factored into smaller, conditional
Artificial Intelligence 13, 2004 – 31

distributions

Das könnte Ihnen auch gefallen