Sie sind auf Seite 1von 10

Journal of Computational Information Systems 11: 15 (2015) 53535362

Available at http://www.Jofcis.com

A Linear Algorithm for Optimal Probabilistic Planning


Lijun WU 1,2,,
1 School

Kai WANG 1 , Jiajun LI 1

of Computer Science and Engineering, University of Electronic Science and Technology of


China, Chengdu 610054, China

2 School

of Information Technology and Electrical Engineering, The University of Queensland,


Brisbane, Australia

Abstract
The maximum probability path problem has been applied in many real elds. However, the problem
of computing a path of maximum probability often is transformed into a shortest path problem so as
to use suitably the existed shortest path algorithm such as Dijkstras algorithm. We propose a new
algorithm for maximum probability path problem, which is linear in size of a system and does not
need to transform. The algorithm mainly exploits probability-rst strategy, a probability ordered queue
and some FIFO (First In First Out) queues. We prove our algorithms soundness and completeness
with respect to optimal probability problem, and take project application as an example to show our
algorithms application.
Keywords: Maximum Probability Path; Shortest Path Problem; Probability-rst Strategy

Introduction

The method based on probability has gotten wider and wider application [1]. Especially, the
computation of maximum probability path has received more and more attention [2-5]. However,
these applications have the following shortcomings. Firstly, they did not consider how to select
best one in multi maximum probability paths; secondly, they did not consider inherent characteristic of maximum probability path, but simply transform the maximum probability path problem
into the shortest path problem; thirdly, they did not associate the probabilistic transitions with
actions of agents.
In this paper, we propose a new algorithm for maximum probability path problem in multiagent systems, which is linear in size of a system. The algorithm exploits probability-rst strategy,
a probability ordered queue and some FIFO queues. The strategy and data structure make our
algorithm has good performance. We rst observe the following example.
Example 1 project application.

Project supported by the National Nature Science Foundation of China (Nos. 61073033 and 61370072).
Corresponding author.
Email address: wljuestc@sina.com (Lijun WU).

15539105 / Copyright 2015 Binary Information Press


DOI: 10.12733/jcis13602
August 1, 2015

5354

L. Wu et al. /Journal of Computational Information Systems 11: 15 (2015) 53535362

Assume that the process of project application consists of three steps: topic selection, writing,
and oral response to reviewers. In the step of topic selection, the applicant collects related
information, and selects and decides the topic of project to be applied; in the second step, the
applicant nishes writing proposal according to the topic above and the collected information;
nally he carries out an oral response to the reviewers of project. For an applicant, while nishing
topic selection, there are three states: good topic, fair topic, and poor topic, denoted by s11 , s12
and s13 , respectively; while nishing writing, there also are three states: good proposal, fair
proposal and poor proposal, denoted by s21 , s22 and s23 , respectively; similarly, while nishing
the last step, there also are three states: good response, fair response and poor response, denoted
by s31 , s32 and s33 , respectively. The start state is denoted by s0 .
Now person A and person B plan to apply a project cooperatively. Assume for every person,
the probabilistic transitions of states are listed in Tables 1(a) and 1(b).
Clearly, it is very important to determine how they cooperate according to the two tables,
namely for every step, to decide who performs the work of this step. The key factor aecting
the decision is how to make the probability from the start to the last step be maximum, namely,
the probability of reaching the state s31 from the start state s0 is maximum, which in fact is just
probabilistic planning problem. In real world, there are many similar problems that need to be
solved. Thus it necessary to set up a general model and an algorithm of probability planning on
the model, which is focus of the paper.
Table 1: Probabilistic transitions of states
(a) Transitions for person A
s21

s22

s23

s31

s32

s33

s11

0.55

0.30

0.15

s12

0.45

0.40

0.15

s13

0.40

0.45

s21

0.15
0.25

0.40

0.35

s22

0.46

0.40

0.14

s23

0.35

0.45

0.20

s31

s32

s33

s21

0.35

0.30

0.35

s22

0.35

0.40

0.25

s23

0.48

0.40

0.12

s0

s11

s12

s13

0.60

0.25

0.15

(b) Transitions for athlete B


s21

s22

s23

s11

0.45

0.35

0.20

s12

0.41

0.30

0.28

s13

0.30

0.40

0.30

s0

2
2.1

s11

s12

s13

0.56

0.34

0.10

Related Work
Maximum probability path

The maximum probability path problem has been applied in many real elds. There have been
many related works [2-5]. A. Zalesky proposed a robust probabilistic ber tracking algorithm [2] that overcomes the unreliability of locally greedy algorithm. His algorithm oered better
tractability than probabilistic approaches, yielding a single well-dened trajectory, and being

5355

L. Wu et al. /Journal of Computational Information Systems 11: 15 (2015) 53535362

guaranteed to yield the same trajectory between two points of interest irrespective of which point
is used as a seed. In Andrew Zaleskys ber tracking algorithm, he transformed the problem of
computing a path of maximum probability into a shortest path problem so as to use suitably the
existed shortest path algorithm such as Dijkstras algorithm [6, 7].
A. Robles-Kelly and E. R. Hancock developed a graph-spectral method for path estimation [3].
The basic idea of the method is to exploit the steady random walk on the graph as an estimation
of the maximum probability path across the graph, and use this path for surface integration and
height recover.
The methods above transformed the maximum probability path problem into the shortest path
problem or other problem, which are non-linear in size of the graph.

2.2

Probabilistic planning

In general, probabilistic planning problems can be divided into two types: fully observable (or
with no observability) and partially observable.
Some research eorts adopt a planning model based on fully observable (or with no observability) markov decision process (MDP) [13-16]. The typical work is the algorithm proposed by
Nicholas Kushmerick et. al. They dened the probabilistic planning problem in terms of a probability distribution over states, the goal represented by a Boolean combination of propositions, a
probability threshold, and actions. The aim of the algorithm is to seek plans whose probability
of success exceeds the threshold.
In recent years, partially observable MDPs (POMDPs) are received wide attention. POMDP
is a powerful probabilistic model with partially observable states and uncertain actions, which
has a lot of important applications including clinical decision, dialog management, and control
policies for robots. Much related work has been performed [17-19]. However, for POMDP, there
is the size limits of tractability.

Definitions and Notations

In this section, we dene the optimal probabilistic planning problem. The goal is expressed by
using propositional logic formula. The objective states are those states where the given goal
is true. Here, we consider environment uncertain. In the following, we rst give some basic
denitions.

3.1

Basic definitions

Definition 1 (Probabilistic transition of a multi-agent system) The probabilistic transition of a


multi-agent system is modeled as a probabilistic function P : S S [0, 1], where S is the set
of all states in a multi-agent system and is the set of all actions of all agents in a multi-agent
system. P (s, a, s ) is the probability that state s will be transited to state s after action a of some
agent is performed.
a

Definition 2 (Enriched path) An enriched path is defined as a transition sequence s0 1 s1 2


an

sn , where si S and P (si1 , ai1 , si ) > 0 for all i {1, 2, , n}. We say the action

5356

L. Wu et al. /Journal of Computational Information Systems 11: 15 (2015) 53535362

sequence (ai )ni=1 is an induced action sequence of the enriched path, and the enriched path is an
inducing path of (ai )ni=1 .
For the convenience of expression, we denote an action sequence a1 a2 an by (ai )ni=1 .
Let r be an enriched path. We use (r) to denote the induced action sequence of r.

3.2

Probabilistic planning problem

After giving some basic denitions, we now dene our key problem, namely, probabilistic planning
problem.
Definition 3 (Probabilistic planning problem) A probabilistic planning problem is 5-tuple < s0 , S,
, T , G >, where s0 is the initial state, S is the set of all states in a multi-agent system, is the
set of all actions, T is the probabilistic transition, and G is the set of all objective states.
Definition 4 (Objective path) An objective path in a probabilistic planning problem < s0 , S, , T ,
a
a
an
G > is an enriched path with the form of s0 1 s1 2
sn , where s0 is the initial state and sn
is an objective state.
Definition 5 (Maximum Probability objective Path) A maximum probability objective path is an
objective path that has maximum probability in all objective paths.
Definition 6 (Optimal objective path) An optimal objective path is defined as an objective path
that has shortest length in all maximum probability objective paths.
a

n
sn is an objective path, then we say the action
Definition 7 (Solution) If s0 1 s1 2
sequence (ai )ni=1 is a solution of the probabilistic planning problem.

Definition 8 (Solution equivalence) We say two solutions are equivalent if their inducing paths
are optimal objective paths.
Given a probabilistic planning problem, our aim is to design an algorithm that compute an
action sequence contained in an optimal objective path. Thus we dene an optimal solution for
the probabilistic planning problem as follows.
Definition 9 (Optimal solution) Assume =< s0 , S, , T , G > is a probabilistic planning problem, and (ai )ni=1 is a sequence of actions. (ai )ni=1 is said to be an optimal solution to if and
only if there is an optimal objective path that induces action sequence (ai )ni=1 .
Clearly, two optimal solutions are equivalent.

Optimal Probabilistic Planning Algorithm

In this section, we propose an optimal probabilistic planning algorithm with linear time complexity, called OPP algorithm. This algorithm consists of OPP top-level algorithm, objective path
computation algorithm, and state extension algorithm.
We rst describe the data structures used in OPP algorithm.

L. Wu et al. /Journal of Computational Information Systems 11: 15 (2015) 53535362

4.1

5357

Data structures

For every state, we add ve elds: probability, distance, predecessor, objective, and visited. The
initial values of the rst four elds of every state are 0 except that the initial arrival probability
of the initial state is 1, and the values of objective eld of every objective state is 1, those of
other states are 0. The probability eld of a state records its current arrival probability; the
distance eld records the distance from the initial state to the state along the current path, and
the distance of every step is 1; the predecessor eld hints the direct predecessor of this state in
the current path; the objective eld shows the state is an objective state if its objective elds
value is 1, otherwise, it is not; the visited led shows if the state is visited or extended. If the
value of the visited eld of the state is 0, then the state has not been visited; if the value is 1,
then the state has been visited but not extended; if the value is 2, then the state has been visited
and extended.
OPP needs a state array, a probabilistic transition matrix, an probability ordered queue and
some FIFO queues. We use a probabilistic transition matrix to describe the probabilistic transition
function. Every element of the probabilistic transition matrix is a 4-tuple with the form <
s, t, a, p >, where p > 0 is the probability that the state s transits to t under the action a, namely,
P (s, a, t) = p > 0. Thus we also call < s, t, a, p > a probabilistic transition.
State array S: store all states in the planning problem. All states are numbered from 1
to N and the array is ordered by these numbers, where N is the state number of planning
problem.
The probabilistic transition matrix T : an adjacent matrix, store all probabilistic transitions
in the planning problem.
FIFO queues: every FIFO queue stores the visited and unextended states with the same
arrival probability. We assume the basic operations over any FIFO queue, such as enqueue()
that inserts an element at the end of a queue and dequeue() that pops an element from the
beginning of a queue.
The probability ordered queue: It stores the arrival probabilities of the visited and unextended states by increasing order of probabilities. Every element in the probability ordered
queue points to a FIFO queue.

4.2

OPP algorithm: top-level

In this algorithm, to nd a optimal objective path, we exploit probability-rst strategy. Thus,


in every while loop, OPP rst selects the last element in the probability ordered queue U and
the FIFO queue which the element points to (End(U).queue), where End(U) is the last element
of U, and End(U).queue means the FIFO queue which End(U) points to. Clearly, in all states
which have been visited and unextended at the current moment, the states in the FIFO queue
have the same maximum arrival probability.
If there are objective states in the FIFO queue (End(U).queue), then OPP can nd an optimal
objective path according to these objective states by invoking the function OPcompute(); otherwise, it continues to extend the FIFO queue V by invoking the function Extend(). This process
is described in Algorithm 1.

5358

L. Wu et al. /Journal of Computational Information Systems 11: 15 (2015) 53535362

Insertqq(s, U) is a function that if there is an element in U whose value is equal to s.probability,


then s is added to the FIFO queue which the element points to. Otherwise, s.probability is inserted
into the probability ordered queue U, and a new FIFO queue is created, and s.probability points
to the FIFO queue, and s is added to the FIFO queue.
The function delqueue(U) is to delete the last element from the probability ordered queue U,
and the FIFO queue which the last element points to.
Algorithm 1 OPP algorithm: top-level
input a probability planning problem < s0 , S, , T , G >;
output an optimal objective path;
Var U: probability ordered queue;
function OPP ()
Var t: state; : state set; V: FIFO queue; begin
Insertqq(s0 , U);
s0 .probability 1;
s0 .visited 1;
s0 .distance 0;
while U is not empty do
V := End(U).queue;
{t|t.objective = 1 and t V};
if is not empty then
OP compute();
else
Extend(V);
delqueue(U);
end if
end while
return FAIL;
end
function Insertqq(s, U)
Var t: real number variable; begin
if there is t U such that t = s.probability then
(t.queue).enqueue(s);
else
insert(s.probability, U);
(End(U).queue).enqueue(s);
end if
end

4.3

Optimal objective path computation

In OPP s search, when the set of states which have been visited and unextended contains objective states, the enriched paths from the initial state to these objective states are maximum
probability objective paths, but the enriched paths from the initial state to other objective states

L. Wu et al. /Journal of Computational Information Systems 11: 15 (2015) 53535362

5359

are impossibly maximum probability objective paths due to the usage of the probability-rst strategy. Thus according to the set Q consisting of these objective states, we can nd an optimal
objective path by Algorithm 2.
To nd an optimal solution, we traverse the set Q. In the traversing process, the variable shortd
records the current shortest path in the found maximum probability objective paths (see Lines 11
and 12). Finally when all objective states in Q have been traversed, the recorded shortest path
is just an optimal objective path. The function path(s0 , shortd) is to compute the enriched path
from the initial state s0 to the state shortd, namely the recorded shortest path. Note that for
every state, tis predecessor eld has unique value, thus there is unique path from the initial state
s0 to the state. In fact, beginning from the state shortd, by using the predecessor eld of every
state in turn, path(s0 , shortd) can get reversely the recorded shortest path from s0 to shortd, and
its induced action sequence just is an optimal solution of this probabilistic planning problem.
Algorithm 2 Optimal path computation
function OP compute(Q)
/* Q is a set of objective states */
Var s, shortd: state; begin
shortd.distance 0;
for s Q do
if shortd.distance = 0 then
/*the rst maximum probability objective path*/
shortd s;
else
if s.distance < shortd.distance then
shortd s;
end if
end if
end foroutput (path(s0 , shortd)); terminate;
end

4.4

State extension

Given a FIFO queue V, the state extension is to deal with all possible successors of every state
in V, and these successors have smaller arrival probability than elements in V, which is outlined
in Algorithm 3. For every state s in V, there are three cases for every successor t of s: the rst
is the case that t has not be visited; the second is that t has been visited but not extended; the
third is that t has been visited and extended.
For the rst case, it is simple to deal with. For the second case, t.visited = 1 shows t is
a visited and unextended state, thus there has been unique path R from the initial state to t,
whose probability is t.probability. t.probability < (s.probabilityp) means that there now is a new
path from the initial state to s and then to t, and this path has higher probability than R, thus it
is necessary to replace the old path R with the new path, and change correspondingly the ordered
queue by the functions Destate(t, U) and Insertqq(t, U). For the third case, because t has been
visited and extended, the arrival probability of t is larger than s.probability probability(s, t).
Thus we do not need to make any change to t.

5360

L. Wu et al. /Journal of Computational Information Systems 11: 15 (2015) 53535362

The function Destate(t, U) is to delete t from the FIFO queue that the element with value
t.probability in U points to, and delete t.probability from U if the FIFO queue has unique state t.
Algorithm 3 State extension algorithm
function Extend(V)
/* V is a FIFO queue */
Var s, t: state; begin
while V is not empty do
s := V.dequeue();
for all t successor(s) do
if t.visited = 0 then
t.distance (s.distance + 1);
t.probability (s.probability probability(s, t));
t.visited 1;
t.predecessor s;
Insertqq(t, U);
else
if t.visited = 1 then
if (t.probability < s.probability probability(s, t)) then
t.distance (s.distance + 1);
t.probability (s.probability probability(s, t));
t.predecessor s;
Destate(t, U);
Insertqq(t, U);
end if
end if
end if
end for
s.visited 2;
Destate(s, U);
end while
end

Complexity Analysis

We easily prove the soundness and completeness of OPP algorithm with similar line of reference [15]. Thus, in this section, we only discuss the complexity of our algorithm.
Lemma 1 The complexity of optimal objective path computation algorithm is linear in size of
input parameter.
Theorem 1 Suppose the state number and the edge (transition) number of an optimal probabilistic planning problem are N and M , respectively. Then the time complexity of OPP algorithm is
O(N + M ).

L. Wu et al. /Journal of Computational Information Systems 11: 15 (2015) 53535362

5361

From Algorithm 1, we know the main factors aecting the complexity of OPP algorithm are
the functions OP compute(), Extend(V) and delqueue(U). Because OP compute() is linear in
size of input parameter, the accumulated operation number of OP compute() in all iterations is
O(N ). We assume that the probability is expressed by a pure decimal number with accuracy of
c decimal digits. Then U has at most 10c elements. Because U is a probability ordered queue,
Insertqq(t, U) needs O(log(10c ))=O(c) operations. Thus the total operations of Extend(V) in all
iterations are O(c M + N )=O(M + N ). Clearly, the function delqueue(U) needs in total O(N )
in all iterations. Thus the complexity of Algorithm 1 is O(M + N ), linear with respect to the size
of the system.

A Case Study

We still take Example 1 as a case study. The actions for topic selection, writing and response
performed by person A are denoted by a1 , a2 and a3 , respectively. Similarly, the actions for topic
selection, writing and response performed by person B are denoted by b1 , b2 and b3 , respectively.
Now we describe the process of extension. Beginning from initial state s0 , the arrival probabilities of states s11 , s12 and s13 after the rst extension are 0.60(a1 ), 0.34(b1 ), and 0.15(a1 ),
respectively. Selecting state s11 with current maximum probability to carry out the second extension, the arrival probability of state s21 is 0.33(a1 a2 ). Selecting state s12 with current maximum
probability 0.34(b1 ) to carry out the third extension, the arrival probability of state s21 remains
0.34(a1 a2 ) because 0.34 0.45 < 0.33. Selecting state s21 with current maximum probability
0.33(a1 a2 ) to carry out the fourth extension, the arrival probability of state s31 is 0.15(a1 a2 b3 ).
a
a
b3
Thus the enriched objective path with maximum arrival probability is s0 1 s11 2 s21
s31 .

Conclusion

This papers main contributions are listed as follows.


1. We proposed a linear algorithm for optimal probabilistic planning, namely OP P algorithm.
The algorithm is mainly based on probability-rst strategy, a probability ordered queue and
some FIFO queues.
2. We dened and proved OP P algorithms soundness and completeness with respect to optimal probabilistic planning problem.
3. We gave a proof of linear complexity of OP P algorithm.
4. We took project application as a case study to show OP P algorithms application.

References
[1]

X. Qiu, X. Huang, L. Wu, Probabilistic Text Categorization using Sparse Topical Encoding, Journal of Computational Information Systems, 2009, 5(3): 1317-1329.

5362

L. Wu et al. /Journal of Computational Information Systems 11: 15 (2015) 53535362

[2]

A. Zalesky, Dt-mri ber tracking: A shortest paths approach, IEEE Transactions on Medical
Imaging 27 (2008) 1458-1471.

[3]

A. Robles-Kelly and E. R. Hancock, Steady state random walks for path estimation, Proc. of the
international conference of Structural, Syntactic, and Statistical Pattern Recognition, (2004), pp.
143-152.

[4]

J. G. David Forney, The viterbi algorithm, Proceeding of the IEEE 61 (1973) 268-278.

[5]

S. Lim, H. Balakrishnan, D. Giord, S. Madden and D. Rus, Stochasticmotion planning and


applications to trac, The International Journal of Robotics Research 30 (2011) 699-712.

[6]

D. Bertsekas and R. Gallagher, Data Networks, 2nd ed (New York: Prentice Hal, 1992).

[7]

R. K. Ahuja, T. L. Magnanti and J. B. Orlin, Network Flows: Theory, Algorithms and Applications
(Englewood Clis, NJ: Prentice Hall, 1993).

[8]

E. Dijkstra, A note on two problems in connexion with graphs, Numerical Mathe-matics 1 (1959)
269-271.

[9]

D. Johnson, Ecient algorithms for shortest paths in sparse networks, Journal of the Association
for Computing Machinery 24 (1977) 1-13.

[10] R. Raman, Recent results on single-source shortest paths problem, SIGACT News 28 (1997) 81-87.
[11] R. Ahuja, K. Mehlhorn, J. Orlin and R. Tarjan, Faster algorithms for the shortest path problem,
Journal of the Association for Computing Machinery 37 (1990) 213-223.
[12] B. Cherkasky, A. Goldberg and T. Radzik, Shortest-paths algorithms: Theory and experimental
evaluation, Mathematical Programming 73 (1996) 129-174.
[13] T. Dean, L. Kaelbling, J. Kirman and A. Nicholson, Planning with deadlines in stochastic domains,
Proc. 11th Nat. Conf. on A.I., (1993).
[14] S. Koenig, Optimal probabilistic and decision-theoretic planning using markovian decision theory,
Proc. of UCB/CSD 92/685, Berkeley, (1992).
[15] N. Kushmerick, S. Hanks and D. S. Weld, An algorithm for probabilistic planning, Articial
Intelligence 76 (1995) 239-286.
[16] C. Domshlak and J. Homann, Probabilistic planning via heuristic forward search and weighted
model counting, Journal of Articial Intelligence Research 30 (2007) 565-620.
[17] S. Sanner and K. Kersting, Symbolic dynamic programming for rst-order pomdps, Proc. of the
24th AAAI Conference on Articial Intelligence (AAAI-10)., (2010), pp. 1140-1146.
[18] C. Wang and R. Khardon, Relational partially observable mdps, Proc. of the 24th AAAI Conference on Articial Intelligence (AAAI-10), (2010), pp. 1153-1158.
[19] T. Smith and R. Simmons, Heuristic search value iteration for pomdps, Proc. of the 20th conference
on Uncertainty in articial intelligence, (2004), pp. 520-527.
[20] R. Bellman, A markovian decision process, Journal of Mathematics and Mechanics 6 (1957) 679684.

Das könnte Ihnen auch gefallen