Sie sind auf Seite 1von 35

Real-Time

Heuristic Search
Richard E. Korf, UCLA

Siddhartha Dutta
Shivam Garg
Click to edit
Master subtitle style
Aditya Nambiar
January 29, 2015

Real-Time Heuristic Search

Introduction
Heuristic search is a fundamental problem-solving
method in AI
Sequence of steps to reach goal isnt known a priori
Systematic trial and error exploration of alternatives
Performance of search algorithms is greatly improved by
heuristic evaluation functions
Heuristic evaluator is an inexpensive function to compute
and estimate the merit of different states relative to the
goal

Real-Time Heuristic Search

Heuristic Search
Implemented in two types of domains:
1. Single-agent problems
2. Two-player games

Real-Time Heuristic Search

Single-agent heuristic search

(n2-1) puzzle problems


Heuristic: Manhattan Distance
Real-Time Heuristic Search

Single-agent heuristic search: Real life example

Autonomous navigation
in a network of roads or
arbitrary terrain
Heuristic: Euclidean
Distance or airline
distance

Real-Time Heuristic Search

A* - Best known single agent heuristic search algorithm


f(n) = g(n) + h(n)
where,
f(n) = Merit of a node
g(n) = Actual cost of reaching that node from initial state
h(n) = Estimated cost of reaching goal state from that node

Pro: Always finds optimal solution if heuristic function


doesnt over estimate actual cost
Con: Requires exponential space (in depth) in practice
Real-Time Heuristic Search

Iterative deepening A* (IDA*)


Modification of A*
Reduces space complexity from exponential to linear
Performs a series of DFS
Branch cut when off when f of frontier node exceeds a
threshold
Threshold starts at h of initial state
Solution is optimal like A*
Expands same number of nodes, asymptotically, as A*
on exponential tree but uses only linear space
Real-Time Heuristic Search

Drawback of A* and IDA*


1. Exponential time to run in practice(in depth)
An unavoidable cost of obtaining optimal solutions
Restricted to relatively small problems in practice
A* with Manhattan Distance can solve Eight Puzzle
IDA* can solve Fifteen Puzzle
Rarely optimal solutions are required
2. Must search all the way to a solution before even committing to
first move
A* and IDA* run to completion in planning/ simulation phase
before committing to first move
Real-Time Heuristic Search

Two-player games

Canonical example: Chess

Real-Time Heuristic Search

Two-player games
Classical algorithm: Full-width, fixed-depth minimax
search with alpha-beta pruning
Search tree of chess: 1043 positions
Cant search all the way to checkmate positions
Limited search horizon, suboptimal decisions made
(for eg. in chess tournaments moves have to be made in
certain time limits and the moves cant be revoked)

Real-Time Heuristic Search

10

Two-player games
Research in single-agent problems
Focussed on increasing size of problems for which
optimal solutions can be found
Research in two-player games
Focussed on making best possible decisions possible
given limited amount of computation available
between moves

Real-Time Heuristic Search

11

Goal of this paper


Applying assumptions of two-player games i.e.
limited search horizon
commitment to moves in constant time
to single-agent heuristic searches
Cause for limited search horizon
computational or informational limitation
Commitment to moves in constant time
backtracking moves counted in solution length
Real-Time Heuristic Search

12

Minimin lookahead search


Specializing the minimax algorithm for two-player
games to single problem-solving agent
Assumption - All edges have same cost
Search forward from current state to a fixed depth
Apply heuristic functions to nodes in the frontier
In two-player game, values were minimised up the tree
Here, move made in direction of child that had best
frontier
Strategy of least commitment
Real-Time Heuristic Search

13

Interleaving planning and execution phase


Minimin lookahead search occurs in a planning mode
where moves are not actually made but simulated
After one complete lookahead search, best move found
is actually executed
This is followed by another lookahead simulation and
another move and so on.

Real-Time Heuristic Search

14

Minimin lookahead search: Non-uniform cost


Take into account the cost of the path from current state
to frontier in addition to the heuristic estimate of the
remaining cost

Real-Time Heuristic Search

15

Minimin lookahead search: Termination


If goal reached before search horizon
path is terminated
heuristic value of zero is set for goal node
If path ends in a non-goal dead-end before the horizon
is reached
heuristic value of infinity is returned for dead-end
node
guarantees the path will not be chosen

Real-Time Heuristic Search

16

Alpha Pruning
An obvious question is whether every frontier node must be
examined to find one with minimum cost?
If we allow heuristic evaluations of interior nodes, then
substantial pruning is possible if the cost function is monotonic
Since f is monotonic, the f values of the frontier nodes below
that node can only be greater than or equal to the f value of
that node, and hence cannot be lower than the value of the
frontier node responsible for the current value of .

Real-Time Heuristic Search

17

Algorithm
Maintain in a variable the lowest f value of any node
encountered on the search horizon so far
For each interior node , compute its f value and terminate search
of the corresponding branch when its f value equals or exceeds
.
As each frontier node is generated, compute its f value as well
and if it is less than , replace with this value.

Real-Time Heuristic Search

18

Node Ordering
As in alpha-beta pruning, the efficiency of alpha pruning ought
be improved by node ordering.
The idea is that we may reach a lower alpha value earlier and
hence prune more branches.
This can be done by visiting the children of the node in
increasing order of thier f values.
How ever it was experimentally observed this lead to only a 5%
reduction of nodes generated which is much less than the
overhead per node required to do the ordering.

Real-Time Heuristic Search

19

Other Real Time Algorithms


The key property of minimin search is that it takes constant
time to run
Note that even though its complexity is an exponential
function of the search horizon, the search horizon is bounded
by a constant and hence so is the running time of the
algorithm.
This property, however, is not unique to this algorithm, and is
shared by at least two other more obvious algorithms.

Real-Time Heuristic Search

20

Time Limited A*
The idea is to run A* until time runs out, and then to make a
single move in the direction of the best node on the OPEN list,
breaking ties in favor of smaller h values.
The main drawback of this algorithm is the same as that of A*:
its memory requirement is exponential in the search depth and
thus in practice it will exhaust the available memory, unless
the amount of time allotted per move is quite small.

Real-Time Heuristic Search

21

Threshold Limited IDA*


The idea is to run a single iteration of IDA* with a threshold
which is always a constant amount greater than the heuristic
estimate of the current state. At the end of the iteration, all
the frontier nodes will have the same f value, and hence a
move would be made in the direction of the minimum h value.

Real-Time Heuristic Search

22

Real Time A*

Click to edit Master subtitle style

Real-Time Heuristic Search

23

Real Time A*
So far we have only addressed the issue of how to make
individual move decisions but not how to make a sequence of
move decisions to arrive at the solution.
Repeating the min-min algorithm for each decision wont work
since we may visit a previously visited state and loop forever.
Simply excluding previously visited states won't work either
since all successors of the current state may have already been
visited.
Decisions are made on limited info

Real-Time Heuristic Search

24

Real Time A*
In RTA* the merit of a node f(n) as in A*. However
g(n) in RTA* => is the actual distance of node n from the
current state of the problem solver rather than the initial
state.
RTA * in other words is best-first search given this slightly
different cost function.
In principle, it could be implemented by storing on an OPEN
list the h values of all previously visited states, and every time
a move is made, updating the g values of all states on OPEN to
accurately reflect their actual distance from the new current
state

Real-Time Heuristic Search

25

Real Time A*
The algorithm maintains in a hash table a list of those nodes that
have been visited by an actual move of the problem solver, together
with an h value for them.
At each cycle of the algorithm, the current state is expanded,
generating its neighbors, and the heuristic function, possibly
augmented by lookahead search( this could be any heuristic
function), is applied to each state which is not in the hash table for
those not in the hash table for those in the hash table the h value in
the hash table is used instead.
The above gives h + cost of edge to each neighboring state gives f
value for each neighbour of current state.


Real-Time Heuristic Search

26

Real Time A*
The node with the minimum f value is chosen for the new
current state and a move to that state is executed.
At the same time, the previous current state is stored in the
hash table, and associated with it is the second best f value.

Real-Time Heuristic Search

27

Theorems Of RTA*
Theorem 1. In a finite problem space with positive edge costs
and finite heuristic values, in which a goal state is reachable
from every state, RTA* will find a solution.
Theorem 2. Each move made by RTA* on a tree is along a path
whose estimated cost of reaching a goal is a minimum, based
on the cumulative search frontier at the time.

Real-Time Heuristic Search

28

Learning RTA*
Successively solving multiple problem instances in the same
problem space with the same goal but different start states
Goal of learning: After learning phase, the heuristic values
should converge to h*
Information saved from one trial to the next: Hash table of
values recorded for previously visited states (h values)

Real-Time Heuristic Search

29

Information stored in hash map


RTA* algorithm records the second best estimate with the
previous state, which represents an accurate estimate of the
previous state looking back from the perspective of the next
state
What if THE best estimate turns out to be correct?
In this case, storing the second best value can result in inflated
values for some states
Modify the algorithm to store the best value in the previous
state instead of the second best

Real-Time Heuristic Search

30

Consequences of modification
LRTA* is no more guaranteed to make locally optimum
decisions
Consider the example:

If LRTA* starts in one of the central 1 states in the Figure, then


it will bounce back and forth between the 1 states until the
original 1 values equal or exceed 5

Real-Time Heuristic Search

31

Consequences of modification(contd)
LRTA* retains the completeness property of RTA* (Similar
proof works)
What do we gain from this modification?
Convergence!
Theorem: In a finite problem space with positive edge costs,
and admissible initial heuristic values, in which a goal state is
reachable from every state, over repeated trials of LRTA*, the
heuristic values will eventually converge to their exact values
along every optimal path.

Real-Time Heuristic Search

32

Efficiency of LRTA*
How long does it take to converge?
The answer depends on the structure of the problem space,
the distribution of initial and goal states, and the initial
heuristic values
Let us take an example
Problem space: Square grid with N units on a side, with all
edges having unit cost
Start state: Top left corner
Goal state: Bottom right corner
Initial heuristic value: 0 for all states

Real-Time Heuristic Search

33

Efficiency of LRTA* (contd.)


On the average, each move of each trial increases the value of
one state by one unit
The average final value of a state is N, with the range being
between zero and 2N-2
Since there are N2, states, the average learning time is O(N3)
moves
This result is borne out by experimental results
Empirically, solution times very close to the optimal value are
actually achieved very early in the learning episode.

Real-Time Heuristic Search

34

THANKS!

Real-Time Heuristic Search

35

Das könnte Ihnen auch gefallen