In real-time domains such as video games, planning happens concurrently with execution and the planning algorithm has a strictly bounded amount of time before it must return the next action for the agent to execute. We explore the use of real-time heuristic search in two benchmark domains inspired by video games. Unlike classic benchmarks such as grid pathfinding and the sliding tile puzzle, these new domains feature exogenous change and directed state space graphs. We consider the setting in which planning and acting are concurrent and we use the natural objective of minimizing goal achievement time. Using both the classic benchmarks and the new domains, we investigate several enhancements to a leading real-time search algorithm, LSS-LRTA*. We show experimentally that 1) it is better to plan after each action or to use a dynamically sized lookahead, 2) A*-based lookahead can cause undesirable actions to be selected, and 3) on-line de-biasing of the heuristic can lead to improved performance. We hope this work encourages future research on applying real-time search in dynamic domains.
Time-Bounded A* is a real-time, single-agent, deterministic search algorithm that expands states of a graph in the same order as A* does, but that unlike A* interleaves search and action execution. Known to outperform state-of-the-art real-time search algorithms based on Korf's Learning Real-Time A* (LRTA*) in some benchmarks, it has not been studied in detail and is sometimes not considered as a ``true'' real-time search algorithm since it fails in non-reversible problems even it the goal is still reachable from the current state. In this paper we propose and study Time-Bounded Best-First Search (TB(BFS)) a straightforward generalization of the time-bounded approach to any best-first search algorithm. Furthermore, we propose Restarting Time-Bounded Weighted A* (TB_R(WA*)), an algorithm that deals more adequately with non-reversible search graphs, eliminating ``backtracking moves'' and incorporating search restarts and heuristic learning. In non-reversible problems we prove that TB(BFS) terminates and we deduce cost bounds for the solutions returned by Time-Bounded Weighted A* (TB(WA*)), an instance of TB(BFS). Furthermore, we prove TB_R(WA*), under reasonable conditions, terminates. We evaluate TB(WA) in both grid pathfinding and the 15-puzzle. In addition, we evaluate TB_R(WA*) on the racetrack problem. We compare our algorithms to LSS-LRTWA*, a variant of LRTA* that can exploit lookahead search and a weighted heuristic. A general observation is that the performance of both TB(WA*) and TB_R(WA*) improves as the weight parameter is increased. In addition, our time-bounded algorithms almost always outperform LSS-LRTWA* by a significant margin.
The research area of real-time heuristics search has produced quite many algorithms. In the landscape of real-time heuristics search research, it is not rare to find that an algorithm X that appears to perform better than algorithm Y on a group of problems, performed worse than Y for another group of problems. If these published algorithms are combined to generate a more powerful space of algorithms, then that novel space of algorithms may solve a distribution of problems more efficiently. Based on this intuition, a recent work Bulitko 2016 has defined the task of finding a combination of heuristics search algorithms as a survival task. In this evolutionary approach, a space of algorithms is defined over a set of building blocks published algorithms and a simulated evolution is used to recombine these building blocks to find out the best algorithm from that space of algorithms. In this paper, we extend the set of building blocks by adding one published algorithm, namely lookahead based A-star shaped local search space generation method from LSSLRTA-star, plus an unpublished novel strategy to generate local search space with Greedy Best First Search. Then we perform experiments in the new space of algorithms, which show that the best algorithms selected by the evolutionary process have the following property: the deeper is the lookahead depth of an algorithm, the lower is its suboptimality and scrubbing complexity.
When minimizing makespan during off-line planning, the fastest action sequence to reach a particular state is, by definition, preferred. When trying to reach a goal quickly in on-line planning, previous work has inherited that assumption: the faster of two paths that both reach the same state is usually considered to dominate the slower one. In this short paper, we point out that, when planning happens concurrently with execution, selecting a slower action can allow additional time for planning, leading to better plans. We present Slo'RTS, a metareasoning planning algorithm that estimates whether the expected improvement in future decision-making from this increased planning time is enough to make up for the increased duration of the selected action. Using simple benchmarks, we show that Slo'RTS can yield shorter time-to-goal than a conventional planner. This generalizes previous work on metareasoning in on-line planning and highlights the inherent uncertainty present in an on-line setting.