Goto

Collaborating Authors

 gbf


Optimize Planning Heuristics to Rank, not to Estimate Cost-to-Goal

Neural Information Processing Systems

Figure 1: Problem instance where perfect heuristic is not strictly optimally efficient with GBFS. However, the path (A, C,D, E) has cost 10 instead of 11 . Then h is a perfect ranking for GBFS on Γ. Proof. We carry the proof by induction with respect to the number of expanded states. Let's now make the induction step and assume the theorem holds for the first A 0 B 1 C 1 D 2 A 1 1 9 9 1 Figure 2: Problem instance where optimally efficient heuristic does not exists for GBFS.





Optimize Planning Heuristics to Rank, not to Estimate Cost-to-Goal

Neural Information Processing Systems

Figure 1: Problem instance where perfect heuristic is not strictly optimally efficient with GBFS. However, the path (A, C,D, E) has cost 10 instead of 11 . Then h is a perfect ranking for GBFS on Γ. Proof. We carry the proof by induction with respect to the number of expanded states. Let's now make the induction step and assume the theorem holds for the first A 0 B 1 C 1 D 2 A 1 1 9 9 1 Figure 2: Problem instance where optimally efficient heuristic does not exists for GBFS.





Parallel Greedy Best-First Search with a Bound on the Number of Expansions Relative to Sequential Search

Shimoda, Takumi, Fukunaga, Alex

arXiv.org Artificial Intelligence

Parallelization of non-admissible search algorithms such as GBFS poses a challenge because straightforward parallelization can result in search behavior which significantly deviates from sequential search. Previous work proposed PUHF, a parallel search algorithm which is constrained to only expand states that can be expanded by some tie-breaking strategy for GBFS. We show that despite this constraint, the number of states expanded by PUHF is not bounded by a constant multiple of the number of states expanded by sequential GBFS with the worst-case tie-breaking strategy. We propose and experimentally evaluate One Bench At a Time (OBAT), a parallel greedy search which guarantees that the number of states expanded is within a constant factor of the number of states expanded by sequential GBFS with some tie-breaking policy.


Extreme Value Monte Carlo Tree Search

Asai, Masataro, Wissow, Stephen

arXiv.org Artificial Intelligence

Despite being successful in board games and reinforcement learning (RL), UCT, a Monte-Carlo Tree Search (MCTS) combined with UCB1 Multi-Armed Bandit (MAB), has had limited success in domain-independent planning until recently. Previous work showed that UCB1, designed for $[0,1]$-bounded rewards, is not appropriate for estimating the distance-to-go which are potentially unbounded in $\mathbb{R}$, such as heuristic functions used in classical planning, then proposed combining MCTS with MABs designed for Gaussian reward distributions and successfully improved the performance. In this paper, we further sharpen our understanding of ideal bandits for planning tasks. Existing work has two issues: First, while Gaussian MABs no longer over-specify the distances as $h\in [0,1]$, they under-specify them as $h\in [-\infty,\infty]$ while they are non-negative and can be further bounded in some cases. Second, there is no theoretical justifications for Full-Bellman backup (Schulte & Keller, 2014) that backpropagates minimum/maximum of samples. We identified \emph{extreme value} statistics as a theoretical framework that resolves both issues at once and propose two bandits, UCB1-Uniform/Power, and apply them to MCTS for classical planning. We formally prove their regret bounds and empirically demonstrate their performance in classical planning.