Beruflich Dokumente
Kultur Dokumente
Soojung Ha*, Seyun Kim*, Minkook Suh*, Hyunwoo Seong*, Kwang Mo Jeong† , Sung-Ho Kim‡
*
Korea Science Academy, † Pusan National University,
‡
Korea Advanced Institute of Science and Technology
sjhakorea@hanmail.net, whataud@naver.com, minkook789@hanmail.net,
hwseong@hotmail.com, kmjung@pusan.ac.kr, sung-ho.kim@kaist.edu
b) d) [Initialization]
1. Initial skeleton has all the nodes, and no edges.
[Adding edges]
2. Obtain a correlation graph from data. Add all the
edges to the skeleton to obtain the correlation.
Figure 2. a) BN b) fixing A c) correlation graph
[Deleting edges]
d) correlation graph when A is fixed
3. If when A is fixed, the correlation coefficient between
To obtain a correlation graph from data, we must B and C is close to zero, delete the edge between B and
consider errors in the data. Even if two variables A and C.
B are independent, P(A|B)=P(A) cannot be fulfilled 4. If when A and B are fixed, the correlation coefficient
precisely unless the sample size is infinite. Instead we
between C and D is close to zero, delete the edge
set a correlation function for measuring dependency
between C and D.
(or independency) and when the function value is over
a certain value, determine that the two variables are Figure 3. Part 1 of the proposed algorithm
If two variables are thought to have a direct causal is fixed, A→C, B→C
relationship, we connect them in the skeleton. 6. If edges A-C and B-C exist in the skeleton but no
In step 1, we start with an initially empty skeleton.
edge exists between A and B in the correlation graph
In step 2, we add all the possible ‘candidates’ for edges
when another variable D is fixed, and the correlation
that represent direct causal relationship. If two
variables have a direct causal relationship, they will be between A and B becomes stronger when C is fixed,
correlated to each other. Therefore, an edge between A→C, B→C
the variables will be added to the skeleton after this 7. If edges A-C and B-C exist in the skeleton but no
step. However, two variables can be correlated without edge exists between A and B in the correlation graph
a direct causal relationship. So some edges must be
when D and E are fixed, and the correlation between A
deleted after this step.
and B becomes stronger when C is fixed, A→C, B→C
In steps 3 and 4, we consider the cases explained
above. This can happen when one variable is an [No convergence of edge direction]
ancestor of the other or when the two variables share a 8. If an edge A-B exists in the correlation graph (but
common ancestor. In either case, if we block every not in the skeleton), and there is only one path that
path connecting the two variables by fixing certain connects A-B in the skeleton, direct the edges so that no
variables on the paths, their correlation will disappear
convergence of edge direction (i.e. X→Y and Z→Y when
and we delete the edge between them.
X, Y, Z are adjacent in the path) happens in the path.
As we only fix up to two variables at a time in this
paper, if the structure is so complex that we cannot [Preventing cycles]
block all the paths between certain two variables, we 9. If A is an ancestor of B and there is an edge
may not be able to delete all the unnecessary edges. between A and B in the skeleton, A→B.
However, we can think of three problems arise from Figure 4. Part 2 of the proposed algorithm
fixing many variables at a time: lengthy execution time,
decrease of sample size, and higher risk of deleting Steps 5 to 7 find three variables A, B, C that A and
edges that do represent a causal relationship. Suppose B are independent, but both are connected to C. In this
that the number of nodes is n, and that it takes case, A and B are both parents of C [6]. This is because
approximately t time to sort out the data, obtain a of the four possible cases
correlation graph, and delete edges accordingly. If up A→C→B, A←C←B, A←C→B, A→C←B.
to n-2 variables could be fixed, (2n-n-2)t time would be where there will be no correlation relationship between
required. This can be a problem when n is large. A and B only in the last case. For verification, we
Decrease of sample size is an equally serious problem. check the correlation between A and B when C is
When computing for correlation graphs with some fixed: if the correlation gets weaker we do not set the
variables fixed, the sample size decreases dramatically arrow directions.
in the data sorting process. Also, by adding more steps Sometimes A and B which are not adjacent to each
to delete edges, we may increase the possibility of other might be both parents of C but not independent.
deleting edges that should not be deleted. In some This is when they share a common ancestor or if one is
cases the correlation between a parent and a child is an ancestor of the other. Steps 6 and 7 are designed to
not very strong, and the edge can be deleted solve this problem. If we fix some variables along the
accidentally. Since omitting edges that represent direct paths connecting A and B (other than A-C-B), we might
relationships is a more serious problem than adding detect that the dependency of A and B is not
edges that do not represent direct relationships, it is contributable to C. Then we can set the arrow
preferable to include only a few steps in edge deletion. directions as in step 5.
Step 8 works in the other way. For two variables
3.2. Edge Orientation to be dependent while not having a direct causal
relationship, they must share a common ancestor or
(If orienting an edge according to the algorithm creates a one must be an ancestor of the other. If there is only
cycle, leave the edge undirected) one path connecting the two variables, one variable
[Convergence of edges]
must be an ancestor of the other, or a variable along
the path must be a common ancestor of both. This is
5. If edges A-C and B-C exist in the skeleton but no
equivalent to ‘no convergence’ of edge direction in the
edge exists between A and B in the correlation graph, path. Step 9 directs all the edges from ancestor to
and the correlation between A and B is stronger when C
descendent. If directed otherwise, the edge direction similar correlation graphs. The skeleton is constructed
creates a cycle. by first adding edges from the original correlation
At the end of the algorithm, some edges are left graph, and then deleting edges considering the
with no direction assigned. This is because for a correlation graphs when one or two variables are fixed.
certain set of correlation graphs, there are many The orientations of the edges are determined in three
Bayesian networks that would produce the same set. A successive steps, respectively related to convergence
simple example is two BNs A→B and A←B. They are or no convergence of edge direction, and preventing
indistinguishable in regard to data. Another example is cycles. The experimental results demonstrate the
when the network is a tree. Regardless of whichever algorithm’s ability to learn BN structure with a high
leaf node we choose as the root, we can obtain a BN accuracy.
that is a faithful representation of data.
Among the steps in the algorithm, step 7 has the
most computation time. With two fixed nodes, we
should consider three other nodes, yielding the time
complexity of the algorithm O(n5) in worst case.
4. Experimental Results
We implemented the algorithm using C++. The
program accepts data as input and produces BN as
output. To test the accuracy of the proposed algorithm,
we created four Bayesian networks and generated four
sets of test data from them. The sample size was 10000.
Then we determined the network structure for each set
of data with our program. Figure 5. Original and deduced graphs (BN3)
The table below summarizes the results. It shows
that our algorithm has a high accuracy. We found from 6. References
the experiments that there was no missed edge, which
means that recall rate is 98.85% (86/87*100%). There [1] R. E. Neopolitan, Learning Bayesian Networks, Prentice
were only a few edges(9) added incorrectly. We got Hall, Chicago, Illinois, 2004.
precision rate 90.53%(86/(86+9)*100%). It should be
also noted that there was no edge with wrong direction. [2] N. Friedman and Z. Yakhini, “On the sample complexity
Figure 5 illustrates one of the simulation results: BN3, of learning Bayesian networks,” Proceedings of the 12th
conference on Uncertainty in Artificial Intelligence, Morgan
where only two edges are added incorrectly, and no
Kaufmann, 1996.
edge in the original graph is missed.
[3] Wallace, C.S., and K. Korb, “Learning Linear Causal
Table 1. Simulation results Models by MML Sampling”, in Gammerman, A. (Ed.):
Random Correct Wrong edges
Total Causal Models and Intelligent Data Mining, Springer-Verlag,
Variable edges (No Wrong
Edges Missed Added New York, 1999.
s direction) direction
BN1 10 16 15 (6) 1 0 0
[4] P. Larranaga, M. Poza, Y. Yurramendi, R.H. Murga,
BN2 13 21 21 (18) 0 3 0 C.M.H. Kuijpers, “Structure learning of Bayesian networks
BN3 16 15 15 (9) 0 2 0 by genetic algorithms: a performance analysis of control
parameters”, IEEE Transactions on Pattern Analysis and
BN4 20 35 35 (17) 0 4 0
Machine Intelligence, Vol. 18, No. 9, 1996, pp. 912-926.
Total 59 87 86 (50) 1 9 0
[5] L.M. de Campos, J.M. Fernandez-Luna, J.A. Gamez, J.M.
5. Conclusion Puerta, “Ant colony optimization for learning Bayesian
networks”, International Journal of Approximate Reasoning,
Vol. 31, No. 3, 2002, pp. 291-311.
In this paper, we presented a simple and efficient
BN learning algorithm based on a statistical approach. [6] G. Rebane and J. Pearl, “The recovery of causal poly-
First we obtain from data the original correlation graph trees from statistical data”, in Proceedings of the Third
and the correlation graphs with some variables fixed. Conference on Uncertainty in Artificial Intelligence, Seattle,
Then we construct a BN that would produce the most Washington, July 1987, pp. 222-228.