Goto

Collaborating Authors

Ruml, Wheeler


Revisiting Suboptimal Search

AAAI Conferences

Suboptimal search algorithms can often solve much larger problems than optimal search algorithms, and thus have broad practical use. This paper returns to early algorithms like WA*, A*_e and Optimistic search. It studies the commonalities between these approaches in order to build a new bounded-suboptimal algorithm. Combined with recent research on avoiding node re-expansions in bounded-optimal search, a new solution quality bound is developed, which often provides proof of the solution bound much earlier during the search. Put together, these ideas provide a new state-of-the-art in bounded-optimal search.


Improved Safe Real-Time Heuristic Search

AAAI Conferences

Empirically, this optimization lead to 0.5 - 2.5% savings on expansions in our experiments A fundamental concern in real-time planning is the presence (Cserna, Gall, and Ruml 2019). of dead-ends in the state space, from which no goal is reachable. SafeRTS interleaves exploration and safety proofs during Providing real-time heuristic search algorithms that are its planning phase. As a direct consequence, it attempts complete in domains with dead-end states is a challenging safety proofs on nodes that become internal to the LSS by problem. Recently, the SafeRTS algorithm was proposed for the end of the search iteration. As shown in Cserna, Gall, and searching in such spaces (Cserna et al. 2018). SafeRTS exploits Ruml (2019), it would be equally or less difficult to achieve a user-provided predicate to identify safe states, from the same or better safety coverage by doing safety proofs after which a goal is likely reachable, and attempts to maintain a all the LSS expansions. SafeRTS has an anytime behavior backup plan for reaching a safe state at all times.


Real-Time Heuristic Search in Dynamic Environments

AAAI Conferences

PLRTA* conflates all states that differ only in time into a single abstract state. Abstract states inherit the union of all In dynamic environments such as cities, agents often do not the predecessors of their preimage states, so that backups have time to find a complete plan to reach a goal state. Planning can be performed properly. PLRTA* learns a single static in such environment requires an agent to update its plan heuristic value for each abstract state. For dynamic learning, frequently to respond to the changes around it. The setting PLRTA* performs the standard Dijkstra-style backup across of real-time heuristic search models online planning by requiring the LSS, considering only costs arising from the dynamic elements the agent to commit to its next action within a strict of the environment. As presented by Cannon, Rose, time limit. The time bound for planning is set to the time and Ruml (2014), the algorithm commits to only one step at which the actions to which the agent has already committed along the selected path, and then replans using updated information.


Improved Safe Real-time Heuristic Search

arXiv.org Artificial Intelligence

A fundamental concern in real-time planning is the presence of dead-ends in the state space, from which no goal is reachable. Recently, the SafeRTS algorithm was proposed for searching in such spaces. SafeRTS exploits a user-provided predicate to identify safe states, from which a goal is likely reachable, and attempts to maintain a backup plan for reaching a safe state at all times. In this paper, we study the SafeRTS approach, identify certain properties of its behavior, and design an improved framework for safe real-time search. We prove that the new approach performs at least as well as SafeRTS and present experimental results showing that its promise is fulfilled in practice.


Solving Large Problems with Heuristic Search: General-Purpose Parallel External-Memory Search

Journal of Artificial Intelligence Research

Classic best-first heuristic search algorithms, like A*, record every unique state they encounter in RAM, making them infeasible for solving large problems. In this paper, we demonstrate how best-first search can be scaled to solve much larger problems by exploiting disk storage and parallel processing and, in some cases, slightly relaxing the strict best-first node expansion order. Some previous disk-based search algorithms abandon best-first search order in an attempt to increase efficiency. We present two case studies showing that A*, when augmented with Delayed Duplicate Detection, can actually be more efficient than these non-best-first search orders. First, we present a straightforward external variant of A*, called PEDAL, that slightly relaxes best-first order in order to be I/O efficient in both theory and practice, even on problems featuring real-valued node costs. Because it is easy to parallelize, PEDAL can be faster than in-memory IDA* even on domains with few duplicate states, such as the sliding-tile puzzle. Second, we present a variant of PEDAL, called PE2A*, that uses partial expansion to handle problems that have large branching factors. When tested on the problem of Multiple Sequence Alignment, PE2A* is the first algorithm capable of solving the entire Reference Set 1 of the standard BAliBASE benchmark using a biologically accurate cost function. This work shows that classic best-first algorithms like A* can be applied to large real-world problems. We also provide a detailed implementation guide with source code both for generic parallel disk-based best-first search and for Multiple Sequence Alignment with a biologically accurate cost function. Given its effectiveness as a general-purpose problem-solving method, we hope that this makes parallel and disk-based search accessible to a wider audience.


Solving Large Problems with Heuristic Search: General-Purpose Parallel External-Memory Search

Journal of Artificial Intelligence Research

Classic best-first heuristic search algorithms, like A*, record every unique state they encounter in RAM, making them infeasible for solving large problems. In this paper, we demonstrate how best-first search can be scaled to solve much larger problems by exploiting disk storage and parallel processing and, in some cases, slightly relaxing the strict best-first node expansion order. Some previous disk-based search algorithms abandon best-first search order in an attempt to increase efficiency. We present two case studies showing that A*, when augmented with Delayed Duplicate Detection, can actually be more efficient than these non-best-first search orders. First, we present a straightforward external variant of A*, called PEDAL, that slightly relaxes best-first order in order to be I/O efficient in both theory and practice, even on problems featuring real-valued node costs.


Avoiding Dead Ends in Real-Time Heuristic Search

AAAI Conferences

Many systems, such as mobile robots, need to be controlled in real time. Real-time heuristic search is a popular on-line planning paradigm that supports concurrent planning and execution. However,existing methods do not incorporate a notion of safety and we show that they can perform poorly in domains that contain dead-end states from which a goal cannot be reached. We introduce new real-time heuristic search methods that can guarantee safety if the domain obeys certain properties. We test these new methods on two different simulated domains that contain dead ends, one that obeys the properties and one that does not. We find that empirically the new methods provide good performance. We hope this work encourages further efforts to widen the applicability of real-time planning.


Planning Time to Think: Metareasoning for On-Line Planning with Durative Actions

AAAI Conferences

When minimizing makespan during off-line planning, the fastest action sequence to reach a particular state is, by definition, preferred. When trying to reach a goal quickly in on-line planning, previous work has inherited that assumption: the faster of two paths that both reach the same state is usually considered to dominate the slower one. In this short paper, we point out that, when planning happens concurrently with execution, selecting a slower action can allow additional time for planning, leading to better plans. We present Slo'RTS, a metareasoning planning algorithm that estimates whether the expected improvement in future decision-making from this increased planning time is enough to make up for the increased duration of the selected action. Using simple benchmarks, we show that Slo'RTS can yield shorter time-to-goal than a conventional planner. This generalizes previous work on metareasoning in on-line planning and highlights the inherent uncertainty present in an on-line setting.


Value Directed Exploration in Multi-Armed Bandits with Structured Priors

arXiv.org Machine Learning

Multi-armed bandits are a quintessential machine learning problem requiring the balancing of exploration and exploitation. While there has been progress in developing algorithms with strong theoretical guarantees, there has been less focus on practical near-optimal finite-time performance. In this paper, we propose an algorithm for Bayesian multi-armed bandits that utilizes value-function-driven online planning techniques. Building on previous work on UCB and Gittins index, we introduce linearly-separable value functions that take both the expected return and the benefit of exploration into consideration to perform n-step lookahead. The algorithm enjoys a sub-linear performance guarantee and we present simulation results that confirm its strength in problems with structured priors. The simplicity and generality of our approach makes it a strong candidate for analyzing more complex multi-armed bandit problems.


Effective Heuristics for Suboptimal Best-First Search

Journal of Artificial Intelligence Research

Suboptimal heuristic search algorithms such as weighted A* and greedy best-first search are widely used to solve problems for which guaranteed optimal solutions are too expensive to obtain. These algorithms crucially rely on a heuristic function to guide their search. However, most research on building heuristics addresses optimal solving. In this paper, we illustrate how established wisdom for constructing heuristics for optimal search can fail when considering suboptimal search. We consider the behavior of greedy best-first search in detail and we test several hypotheses for predicting when a heuristic will be effective for it. Our results suggest that a predictive characteristic is a heuristic's goal distance rank correlation (GDRC), a robust measure of whether it orders nodes according to distance to a goal. We demonstrate that GDRC can be used to automatically construct abstraction-based heuristics for greedy best-first search that are more effective than those built by methods oriented toward optimal search. These results reinforce the point that suboptimal search deserves sustained attention and specialized methods of its own.