When minimizing makespan during off-line planning, the fastest action sequence to reach a particular state is, by definition, preferred. When trying to reach a goal quickly in on-line planning, previous work has inherited that assumption: the faster of two paths that both reach the same state is usually considered to dominate the slower one. In this short paper, we point out that, when planning happens concurrently with execution, selecting a slower action can allow additional time for planning, leading to better plans. We present Slo'RTS, a metareasoning planning algorithm that estimates whether the expected improvement in future decision-making from this increased planning time is enough to make up for the increased duration of the selected action. Using simple benchmarks, we show that Slo'RTS can yield shorter time-to-goal than a conventional planner. This generalizes previous work on metareasoning in on-line planning and highlights the inherent uncertainty present in an on-line setting.
Real-time heuristic search addresses the setting in which planning andacting can proceed concurrently. We explore the use of metareasoning at two decision points within a real-time heuristic search. First, if the domain has an `identity action' that allows the agent to remain in the same state and deliberate further, when should this action be taken? Second, given a partial plan that extends to the lookahead frontier, to how many actions should the agent commit? We show that considering these decisions carefully can reduce the agent's total time taken to arrive at a goal in several benchmark domains, relative to the current state-of-the-art. The resulting algorithm can dynamically adjust the way it interleaves planning and acting, between greedy hill-climbing and A*, depending on the problem instance.
Many systems, such as mobile robots, need to be controlled in real time. Real-time heuristic search is a popular on-line planning paradigm that supports concurrent planning and execution. However,existing methods do not incorporate a notion of safety and we show that they can perform poorly in domains that contain dead-end states from which a goal cannot be reached. We introduce new real-time heuristic search methods that can guarantee safety if the domain obeys certain properties. We test these new methods on two different simulated domains that contain dead ends, one that obeys the properties and one that does not. We find that empirically the new methods provide good performance. We hope this work encourages further efforts to widen the applicability of real-time planning.
In real-time domains such as video games, planning happens concurrently with execution and the planning algorithm has a strictly bounded amount of time before it must return the next action for the agent to execute. We explore the use of real-time heuristic search in two benchmark domains inspired by video games. Unlike classic benchmarks such as grid pathfinding and the sliding tile puzzle, these new domains feature exogenous change and directed state space graphs. We consider the setting in which planning and acting are concurrent and we use the natural objective of minimizing goal achievement time. Using both the classic benchmarks and the new domains, we investigate several enhancements to a leading real-time search algorithm, LSS-LRTA*. We show experimentally that 1) it is better to plan after each action or to use a dynamically sized lookahead, 2) A*-based lookahead can cause undesirable actions to be selected, and 3) on-line de-biasing of the heuristic can lead to improved performance. We hope this work encourages future research on applying real-time search in dynamic domains.
A fundamental concern in real-time planning is the presence of dead-ends in the state space, from which no goal is reachable. Recently, the SafeRTS algorithm was proposed for searching in such spaces. SafeRTS exploits a user-provided predicate to identify safe states, from which a goal is likely reachable, and attempts to maintain a backup plan for reaching a safe state at all times. In this paper, we study the SafeRTS approach, identify certain properties of its behavior, and design an improved framework for safe real-time search. We prove that the new approach performs at least as well as SafeRTS and present experimental results showing that its promise is fulfilled in practice.