levints
- North America > Canada > Alberta (0.14)
- Europe > United Kingdom > England > Greater London > London (0.04)
- Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)
- (3 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Subgoal-Guided Policy Heuristic Search with Learned Subgoals
Tuero, Jake, Buro, Michael, Lelis, Levi H. S.
Policy tree search is a family of tree search algorithms that use a policy to guide the search. These algorithms provide guarantees on the number of expansions required to solve a given problem that are based on the quality of the policy. While these algorithms have shown promising results, the process in which they are trained requires complete solution trajectories to train the policy. Search trajectories are obtained during a trial-and-error search process. When the training problem instances are hard, learning can be prohibitively costly, especially when starting from a randomly initialized policy. As a result, search samples are wasted in failed attempts to solve these hard instances. This paper introduces a novel method for learning subgoal-based policies for policy tree search algorithms. The subgoals and policies conditioned on subgoals are learned from the trees that the search expands while attempting to solve problems, including the search trees of failed attempts. We empirically show that our policy formulation and training method improve the sample efficiency of learning a policy and heuristic function in this online setting.
- North America > Canada > Alberta (0.14)
- Europe > United Kingdom > England > Greater London > London (0.04)
- Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)
- (3 more...)
Reviews: Single-Agent Policy Tree Search With Guarantees
The paper presents two planning algorithms based on tree search. The novelty of these algorithms is the use of a policy based criterion instead of a standard heuristic to guide the search. The first algorithm (LevinTS) uses a cost function d(n)/\pi(n) which ensures nodes are expanded in a best-first order. This allows the authors to derive an upper bound to the number of expansions performed before to reach a target node. The second algorithm (LubyTS) is a randomized algorithm based on the sampling of trajectories.
MindStar: Enhancing Math Reasoning in Pre-trained LLMs at Inference Time
Kang, Jikun, Li, Xin Zhe, Chen, Xi, Kazemi, Amirreza, Sun, Qianyi, Chen, Boxing, Li, Dong, He, Xu, He, Quan, Wen, Feng, Hao, Jianye, Yao, Jun
Although Large Language Models (LLMs) achieve remarkable performance across various tasks, they often struggle with complex reasoning tasks, such as answering mathematical questions. Recent efforts to address this issue have primarily focused on leveraging mathematical datasets through supervised fine-tuning or self-improvement techniques. However, these methods often depend on high-quality datasets that are difficult to prepare, or they require substantial computational resources for fine-tuning. Inspired by findings that LLMs know how to produce the right answer but struggle to select the correct reasoning path, we propose a purely inference-based searching method -- MindStar (M*). This method formulates reasoning tasks as searching problems and proposes two search ideas to identify the optimal reasoning paths. We evaluate the M* framework on both the GSM8K and MATH datasets, comparing its performance with existing open and closed-source LLMs. Our results demonstrate that M* significantly enhances the reasoning abilities of open-source models, such as Llama-2-13B and Mistral-7B, and achieves comparable performance to GPT-3.5 and Grok-1, but with substantially reduced model size and computational costs.
- Media (0.47)
- Information Technology > Security & Privacy (0.46)
Policy-Guided Heuristic Search with Guarantees
Orseau, Laurent, Lelis, Levi H. S.
The use of a policy and a heuristic function for guiding search can be quite effective in adversarial problems, as demonstrated by AlphaGo and its successors, which are based on the PUCT search algorithm. While PUCT can also be used to solve single-agent deterministic problems, it lacks guarantees on its search effort and it can be computationally inefficient in practice. Combining the A* algorithm with a learned heuristic function tends to work better in these domains, but A* and its variants do not use a policy. Moreover, the purpose of using A* is to find solutions of minimum cost, while we seek instead to minimize the search loss (e.g., the number of search steps). LevinTS is guided by a policy and provides guarantees on the number of search steps that relate to the quality of the policy, but it does not make use of a heuristic function. In this work we introduce Policy-guided Heuristic Search (PHS), a novel search algorithm that uses both a heuristic function and a policy and has theoretical guarantees on the search loss that relates to both the quality of the heuristic and of the policy. We show empirically on the sliding-tile puzzle, Sokoban, and a puzzle from the commercial game `The Witness' that PHS enables the rapid learning of both a policy and a heuristic function and compares favorably with A*, Weighted A*, Greedy Best-First Search, LevinTS, and PUCT in terms of number of problems solved and search time in all three domains tested.
- North America > Canada > Alberta (0.14)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > California > San Diego County > San Diego (0.04)
- (2 more...)
- Leisure & Entertainment > Games > Computer Games (0.48)
- Leisure & Entertainment > Games > Go (0.34)
Single-Agent Policy Tree Search With Guarantees
Orseau, Laurent, Lelis, Levi, Lattimore, Tor, Weber, Theophane
We introduce two novel tree search algorithms that use a policy to guide search. The first algorithm is a best-first enumeration that uses a cost function that allows us to provide an upper bound on the number of nodes to be expanded before reaching a goal state. We show that this best-first algorithm is particularly well suited for ``needle-in-a-haystack'' problems. The second algorithm, which is based on sampling, provides an upper bound on the expected number of nodes to be expanded before reaching a set of goal states. We show that this algorithm is better suited for problems where many paths lead to a goal. We validate these tree search algorithms on 1,000 computer-generated levels of Sokoban, where the policy used to guide search comes from a neural network trained using A3C. Our results show that the policy tree search algorithms we introduce are competitive with a state-of-the-art domain-independent planner that uses heuristic search.
- North America > Canada > Alberta (0.14)
- Europe > United Kingdom > England > Greater London > London (0.04)
- Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)
- (3 more...)
Single-Agent Policy Tree Search With Guarantees
Orseau, Laurent, Lelis, Levi, Lattimore, Tor, Weber, Theophane
We introduce two novel tree search algorithms that use a policy to guide search. The first algorithm is a best-first enumeration that uses a cost function that allows us to provide an upper bound on the number of nodes to be expanded before reaching a goal state. We show that this best-first algorithm is particularly well suited for ``needle-in-a-haystack'' problems. The second algorithm, which is based on sampling, provides an upper bound on the expected number of nodes to be expanded before reaching a set of goal states. We show that this algorithm is better suited for problems where many paths lead to a goal. We validate these tree search algorithms on 1,000 computer-generated levels of Sokoban, where the policy used to guide search comes from a neural network trained using A3C. Our results show that the policy tree search algorithms we introduce are competitive with a state-of-the-art domain-independent planner that uses heuristic search.
- North America > Canada > Alberta (0.14)
- Europe > United Kingdom > England > Greater London > London (0.04)
- Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)
- (3 more...)
Single-Agent Policy Tree Search With Guarantees
Orseau, Laurent, Lelis, Levi H. S., Lattimore, Tor, Weber, Théophane
We introduce two novel tree search algorithms that use a policy to guide search. The first algorithm is a best-first enumeration that uses a cost function that allows us to prove an upper bound on the number of nodes to be expanded before reaching a goal state. We show that this best-first algorithm is particularly well suited for "needle-in-a-haystack" problems. The second algorithm is based on sampling and we prove an upper bound on the expected number of nodes it expands before reaching a set of goal states. We show that this algorithm is better suited for problems where many paths lead to a goal. We validate these tree search algorithms on 1,000 computer-generated levels of Sokoban, where the policy used to guide the search comes from a neural network trained using A3C. Our results show that the policy tree search algorithms we introduce are competitive with a state-of-the-art domain-independent planner that uses heuristic search.
- North America > Canada > Alberta (0.14)
- North America > United States > Nevada > Clark County > Las Vegas (0.04)
- Europe > United Kingdom > England > Greater London > London (0.04)
- (3 more...)