Sie sind auf Seite 1von 20

Bayesian Networks1

Machine Learning

1
Some of these exercises are from Bratko (2007) “Prolog Programming for
Artificial Intelligence”; and from Poole and Mackworth (2010), “Artificial
Intelligence: Foundations of Computational Agents”
Machine Learning Bayesian Networks
Representation and Inference I

1. A Bayesian network has the following graphical structure:

Assume you have all the conditional probability tables


necessary to define the network completely. Derive the
formula for computing the conditional probability P(C |a)

Machine Learning Bayesian Networks


Representation and Inference II

2. A Bayesian network has the following graphical structure:

The conditional probability tables are as follows:

Machine Learning Bayesian Networks


Representation and Inference III
P(a) = 0.1 P(b) = 0.1
P(c|a, b) = 0.9 P(c|¬a, b) = 0.6
P(c|a, ¬b) = 0.8 P(c|¬a, ¬b) = 0.3
P(d|b) = 0.9 P(d|¬b) = 0.1
P(e|d) = 0.1 P(e|¬d) = 0.9

In each of the cases below estimate, without actually


calculating numerically, which of the two probabilities is
greater.
(a) P(c) or P(c|d)
(b) P(a|c) or P(b|c)
(c) P(a|c) or P(a|c, e)

3. Draw the expression tree that shows the sum-and-product


calculations for computing P(e) using the network in the
previous question.

Machine Learning Bayesian Networks


Representation and Inference IV
4. Let A, B and C be Boolean variables with possible values 0
and 1. The variables are related such that C = XOR(A,B)
(that is C = (A+B) mod 2). Draw a Bayesian network and
define the corresponding probabilities for this network which
correspond to this relation. You may assume prior
probabilities of A and B are 0.5.

5. item Add to the XOR network network two additional nodes,


D and K, and the corresponding links and probabilities so that
this new network represents the following situation. We are
testing in a written exam whether a student knows the
operation XOR. In the exam problem, the student is given the
values of A and B, an is asked to calculate XOR(A,B). The
students answer is D. D should ideally be equal C. But D may
be different from C if the student does not know about XOR.
Even if the student knows about XOR, the answer may still be
Machine Learning Bayesian Networks
Representation and Inference V

incorrect due to a silly mistake. Let the variable K = 1 if the


student knows the XOR operation, otherwise K = 0. If the
student knows the operation then his or her answer D will be
correct in 99% of the cases. If the student does not know
XOR, then the answer D will be chosen completely randomly
with equal probabilities of 0 and 1. Draw the Bayesian
network to represent this situation. You may assume that the
the prior probability of K is 0.5.

6. In the previous question, let A = 0, B = 1 and D = 1. What


is the (approximate) probability that the student knows about
XOR?

Machine Learning Bayesian Networks


Sampling I

I The technique of rejection sampling from the distribution


encoded by a Bayesian network essentially consists of
generating values of random variables in the order specified by
an ancestral ordering of the network, and then rejecting
(throwing away) any samples that do not agree with values of
observed variables
I A Bayesian network has the following graphical structure:

Machine Learning Bayesian Networks


Sampling II
I Assume you have all the conditional probability tables
necessary to define the network completely.

7. How many ancestral orderings are there for this graph?

8. Suppose you want to estimate P(a|b, c) using rejection


sampling, and draw 1000 samples using an appropriate
ordering over the variables. Approximately how many of these
would be rejected?

9. Assume you are estimating P(d|a) using rejection sampling.


2000 samples were not rejected. Approximately how many
samples would have both a and b true?

Machine Learning Bayesian Networks


Sampling III

I The technique of Gibbs sampling from a Bayesian network


essentially consists of 3 steps: (1) Initialisation; (2)
(Re-)Sampling; and (3) Repetition.

10. In the network below, show that when (re-)sampling values for
node X , we will only need information from the Markov
Blanket of X . That is, we will only need: (a) the parents of
X ; (b) the children of X ; and (c) the parents of the children
of X . Further, what kind of information is needed from the

Machine Learning Bayesian Networks


Sampling IV

Markov Blanket?

Machine Learning Bayesian Networks


Conditional Independence I

(From: Neapolitan and Jiang (2012), Contemporary AI , CRC


Press.)
I One way to think of a Bayesian network is that it is a
representation of a probability distribution that satisfies the
Markov condition
I Suppose we have a joint probability distribution P over some
r.v’s and a DAG G with vertices V and edges E . If the
vertices denote random variables, then (G , P) satisfies the
Markov condition if for each X ∈ V , X is conditionally
independent of its non-descendents in G , given its parents
I If PAX are the parents of X and ND X are the non-descendents
of X then:
P(X |ND X , PAX ) = P(X |PAX )

Machine Learning Bayesian Networks


Conditional Independence II
I So, for the following to be a Bayesian network:

we would require that the underlying probability distribution


satisfies the following conditional independences: (a) B is
conditionally independent of C, given A; and (b) C is
conditionally independent of B, given A
I MORAL: Not every DAG G with r.v.’s at vertices is a Bayesian
network
I If (G , P) satisfies the Markov condition, then it can be shown
that P can be factorised into a product of conditional
distributions of nodes given their parents in G
Machine Learning Bayesian Networks
Conditional Independence III

11. If all the r.v’s in the graph shown below are binary, and their
joint distribution satisfies the Markov condition, how many
entries are needed: (a) in the full joint distribution; and (b) in
the factorised conditional distributions:

Machine Learning Bayesian Networks


Conditional Independence IV

12. For the following Bayesian network, list out the parents,
non-descendents and conditional independences identified by
the Markov condition of each node

Machine Learning Bayesian Networks


Conditional Independence (contd.): d-Separation I

I Suppose a joint distribution sat-


isfies the Markov condition with the following network structure

I The Markov condition tells us that B and C are conditionally


independent of each other given A. But what about D and C
given A?
I The Markov condition has nothing to say about D and C,
given A

Machine Learning Bayesian Networks


Conditional Independence (contd.): d-Separation II
I But
X
P(D|C = c, A = a) = P(D|B, c, a)P(B|c, a)
B
X
= P(D|B, a)P(B|a) (Markov condition
B
= P(D|a)

I So, the Markov condition actually results in more conditional


independences than are apparent immediately
I To obtain all the conditional independences entailed by the
Markov condition requires you to know about the property
called d-separation.
I A chain in a directed graph with vertices V and edges E is a
sequence of vertices hX1 , . . . , Xk i (Xi ∈ V and k ≥ 2) s.t.
hXi−1 , Xi i ∈ E or hXi , Xi−1 i ∈ E

Machine Learning Bayesian Networks


Conditional Independence (contd.): d-Separation III

I We need to distinguish between 3 kinds of chains in a


Bayesian network:
I A → B ← C (head-to-head meeting at B)
I A → B → C (head-to-tail meeting at B)
I A ← B → C (tail-to-tail meeting at B)
I A set of vertices W blocks a chain from vertex X to vertex Y
if at least one of the following is true:
– There is a vertex Z ∈ W s.t. there is a head-to-tail meeting at
Z ; or
– There is a vertex Z ∈ W s.t. there is a tail-to-tail neeting at
Z ; or
I There is a vertex Z 6∈ W s.t. there is a head-to-head meeting

at Z , and none of Z ’s descendents are in W


I A set of vertices W d-separates vertices X and Y if W blocks
every chain between X and Y

Machine Learning Bayesian Networks


Conditional Independence (contd.): d-Separation IV

I Given a Bayesian network with DAG G = (V , E ), vertices X


and Y are conditionally independent of each other given
W = V − {X , Y }, if W d-separates X and Y
I More generally, given subsets S1 , S2 , S3 if for every X ∈ S1 and
Y ∈ S2 , the set S3 d-separates vertices X and Y , then vertices
in S1 are conditionally independent of vertices in S2 given S3
I Which vertices are d-separated in:

Machine Learning Bayesian Networks


Conditional Independence (contd.): d-Separation V

13. Which conditional independences between pairs of vertices are


entailed by the Markov condition in the Bayesian network:

I Given a set of random variables V , a Markov Blanket of a


random variable X ∈ V is some set of variables M from
V − {X }, s.t. X is conditionally independent of
V − (M ∪ {X }) given M

Machine Learning Bayesian Networks


Conditional Independence (contd.): d-Separation VI

I In a Bayesian network, he set M consisting of the parents of


X , the children of X and the parents of the children of X is a
Markov Blanket of X
I The proof of this uses the fact that M d-separates X from all
other nodes in the network

Machine Learning Bayesian Networks