AITopics | planning

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

arXiv.org Machine LearningJul-15-2025

Continual Reinforcement Learning by Planning with Online World Models

Liu, Zichen, Fu, Guoji, Du, Chao, Lee, Wee Sun, Lin, Min

Continual reinforcement learning (CRL) refers to a naturalistic setting where an agent needs to endlessly evolve, by trial and error, to solve multiple tasks that are presented sequentially. One of the largest obstacles to CRL is that the agent may forget how to solve previous tasks when learning a new task, known as catastrophic forgetting. In this paper, we propose to address this challenge by planning with online world models. Specifically, we learn a Follow-The-Leader shallow model online to capture the world dynamics, in which we plan using model predictive control to solve a set of tasks specified by any reward functions. The online world model is immune to forgetting by construction with a proven regret bound of $\mathcal{O}(\sqrt{K^2D\log(T)})$ under mild assumptions. The planner searches actions solely based on the latest online model, thus forming a FTL Online Agent (OA) that updates incrementally. To assess OA, we further design Continual Bench, a dedicated environment for CRL, and compare with several strong baselines under the same model-planning algorithmic framework. The empirical results show that OA learns continuously to solve new tasks while not forgetting old skills, outperforming agents built on deep world models with various continual learning techniques.

artificial intelligence, continual reinforcement learning, machine learning, (13 more...)

arXiv.org Machine Learning

2507.09177

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)

Neural Information Processing SystemsMay-27-2025, 08:12:25 GMT

Planning with General Objective Functions: Going Beyond Total Rewards

Standard sequential decision-making paradigms aim to maximize the cumulative reward when interacting with the unknown environment., i.e., maximize \sum_{h 1} H r_h where H is the planning horizon. However, this paradigm fails to model important practical applications, e.g., safe control that aims to maximize the lowest reward, i.e., maximize \min_{h 1} H r_h . In this paper, based on techniques in sketching algorithms, we propose a novel planning algorithm in deterministic systems which deals with a large class of objective functions of the form f(r_1, r_2, ... r_H) that are of interest to practical applications. We show that efficient planning is possible if f is symmetric under permutation of coordinates and satisfies certain technical conditions. Complementing our algorithm, we further prove that removing any of the conditions will make the problem intractable in the worst case and thus demonstrate the necessity of our conditions.

artificial intelligence, general objective function, machine learning, (4 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.45)

Neural Information Processing SystemsMay-26-2025, 20:52:26 GMT

Chain of Thoughtlessness? An Analysis of CoT in Planning

Large language model (LLM) performance on reasoning problems typically does not generalize out of distribution. Previous work has claimed that this can be mitigated with chain of thought prompting--a method of demonstrating solution procedures--with the intuition that it is possible to in-context teach an LLM an algorithm for solving the problem.This paper presents a case study of chain of thought on problems from Blocksworld, a classical planning domain, and examines the performance of two state-of-the-art LLMs across two axes: generality of examples given in prompt, and complexity of problems queried with each prompt. While our problems are very simple, we only find meaningful performance improvements from chain of thought prompts when those prompts are exceedingly specific to their problem class, and that those improvements quickly deteriorate as the size n of the query-specified stack grows past the size of stacks shown in the examples.We also create scalable variants of three domains commonly studied in previous CoT papers and demonstrate the existence of similar failure modes.Our results hint that, contrary to previous claims in the literature, CoT's performance improvements do not stem from the model learning general algorithmic procedures via demonstrations but depend on carefully engineering highly problem specific prompts. This spotlights drawbacks of chain of thought, especially the sharp tradeoff between possible performance gains and the amount of human labor necessary to generate examples with correct reasoning traces.

artificial intelligence, large language model, natural language, (7 more...)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Neural Information Processing SystemsMay-26-2025, 15:45:58 GMT

Can Graph Learning Improve Planning in LLM-based Agents?

Task planning in language agents is emerging as an important research topic alongside the development of large language models (LLMs). It aims to break down complex user requests in natural language into solvable sub-tasks, thereby fulfilling the original requests. In this context, the sub-tasks can be naturally viewed as a graph, where the nodes represent the sub-tasks, and the edges denote the dependencies among them. Consequently, task planning is a decision-making problem that involves selecting a connected path or subgraph within the corresponding graph and invoking it. In this paper, we explore graph learning-based methods for task planning, a direction that is orthogonal to the prevalent focus on prompt design. Our interest in graph learning stems from a theoretical discovery: the biases of attention and auto-regressive loss impede LLMs' ability to effectively navigate decision-making on graphs, which is adeptly addressed by graph neural networks (GNNs).

large language model, machine learning, natural language, (5 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Artificial IntelligenceMar-21-2025

LLM+MAP: Bimanual Robot Task Planning using Large Language Models and Planning Domain Definition Language

Chu, Kun, Zhao, Xufeng, Weber, Cornelius, Wermter, Stefan

Bimanual robotic manipulation provides significant versatility, but also presents an inherent challenge due to the complexity involved in the spatial and temporal coordination between two hands. Existing works predominantly focus on attaining human-level manipulation skills for robotic hands, yet little attention has been paid to task planning on long-horizon timescales. With their outstanding in-context learning and zero-shot generation abilities, Large Language Models (LLMs) have been applied and grounded in diverse robotic embodiments to facilitate task planning. However, LLMs still suffer from errors in long-horizon reasoning and from hallucinations in complex robotic tasks, lacking a guarantee of logical correctness when generating the plan. Previous works, such as LLM+P, extended LLMs with symbolic planners. However, none have been successfully applied to bimanual robots. New challenges inevitably arise in bimanual manipulation, necessitating not only effective task decomposition but also efficient task allocation. To address these challenges, this paper introduces LLM+MAP, a bimanual planning framework that integrates LLM reasoning and multi-agent planning, automating effective and efficient bimanual task planning. We conduct simulated experiments on various long-horizon manipulation tasks of differing complexity. Our method is built using GPT-4o as the backend, and we compare its performance against plans generated directly by LLMs, including GPT-4o, V3 and also recent strong reasoning models o1 and R1. By analyzing metrics such as planning time, success rate, group debits, and planning-step reduction rate, we demonstrate the superior performance of LLM+MAP, while also providing insights into robotic reasoning. Code is available at https://github.com/Kchu/LLM-MAP.

large language model, machine learning, natural language, (22 more...)

2503.17309

Country:

North America > United States (0.04)
Europe > Germany > Hamburg (0.04)
Europe > France (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceMar-12-2025

Plan-and-Act: Improving Planning of Agents for Long-Horizon Tasks

Erdogan, Lutfi Eren, Lee, Nicholas, Kim, Sehoon, Moon, Suhong, Furuta, Hiroki, Anumanchipalli, Gopala, Keutzer, Kurt, Gholami, Amir

Large language models (LLMs) have shown remarkable advancements in enabling language agents to tackle simple tasks. However, applying them for complex, multi-step, long-horizon tasks remains a challenge. Recent work have found success by separating high-level planning from low-level execution, which enables the model to effectively balance high-level planning objectives and low-level execution details. However, generating accurate plans remains difficult since LLMs are not inherently trained for this task. To address this, we propose Plan-and-Act, a novel framework that incorporates explicit planning into LLM-based agents and introduces a scalable method to enhance plan generation through a novel synthetic data generation method. Plan-and-Act consists of a Planner model which generates structured, high-level plans to achieve user goals, and an Executor model that translates these plans into environment-specific actions. To train the Planner effectively, we introduce a synthetic data generation method that annotates ground-truth trajectories with feasible plans, augmented with diverse and extensive examples to enhance generalization. We evaluate Plan-and-Act using web navigation as a representative long-horizon planning environment, demonstrating a state-of the-art 54% success rate on the WebArena-Lite benchmark.

global plan, planning, trajectory, (11 more...)

2503.09572

Country: Asia (0.14)

Genre:

Workflow (1.00)
Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Chatrola, Jeel, Ajith, Abhiroop, Leahy, Kevin, Chamzas, Constantinos

Multi-layer Motion Planning with Kinodynamic and Spatio-Temporal Constraints

arXiv.org Artificial IntelligenceMar-10-2025

We propose a novel, multi-layered planning approach for computing paths that satisfy both kinodynamic and spatiotemporal constraints. Our three-part framework first establishes potential sequences to meet spatial constraints, using them to calculate a geometric lead path. This path then guides an asymptotically optimal sampling-based kinodynamic planner, which minimizes an STL-robustness cost to jointly satisfy spatiotemporal and kinodynamic constraints. In our experiments, we test our method with a velocity-controlled Ackerman-car model and demonstrate significant efficiency gains compared to prior art. Additionally, our method is able to generate complex path maneuvers, such as crossovers, something that previous methods had not demonstrated.

constraint, planning, specification, (14 more...)

2503.07762

Country:

North America > United States (0.15)
Asia > Middle East > Republic of Türkiye (0.14)
Europe > United Kingdom > England (0.14)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (0.71)
Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.46)

Franke, Julius, Moldagalieva, Akmaral, Hanfeld, Pia, Hönig, Wolfgang

Accelerating db-A* for Kinodynamic Motion Planning Using Diffusion

arXiv.org Artificial IntelligenceMar-10-2025

Abstract-- We present a novel approach for generating motion primitives for kinodynamic motion planning using diffusion models. The motions generated by our approach are adapted to each problem instance by utilizing problem-specific parameters, allowing for finding solutions faster and of better quality. The diffusion models used in our approach are trained on randomly cut solution trajectories. These trajectories are created by solving randomly generated problem instances with a kinodynamic motion planner. Experimental results show significant improvements up to 30 percent in both computation time and solution quality across varying robot dynamics such as secondorder unicycle or car with trailer.

diffusion model, motion primitive, trajectory, (15 more...)

2503.05539

Country:

North America > United States (0.46)
Europe (0.28)
Asia > Japan (0.14)

Genre: Research Report > New Finding (0.34)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (0.72)

Rabiei, Behrad, R., Mahesh Kumar A., Dai, Zhirui, Pilla, Surya L. S. R., Dong, Qiyue, Atanasov, Nikolay

LTLCodeGen: Code Generation of Syntactically Correct Temporal Logic for Robot Task Planning

arXiv.org Artificial IntelligenceMar-10-2025

This paper focuses on planning robot navigation tasks from natural language specifications. We develop a modular approach, where a large language model (LLM) translates the natural language instructions into a linear temporal logic (LTL) formula with propositions defined by object classes in a semantic occupancy map. The LTL formula and the semantic occupancy map are provided to a motion planning algorithm to generate a collision-free robot path that satisfies the natural language instructions. Our main contribution is LTLCodeGen, a method to translate natural language to syntactically correct LTL using code generation. We demonstrate the complete task planning method in real-world experiments involving human speech to provide navigation instructions to a mobile robot. We also thoroughly evaluate our approach in simulated and real-world experiments in comparison to end-to-end LLM task planning and state-of-the-art LLM-to-LTL translation methods.

instruction, ltl formula, natural language instruction, (11 more...)

2503.07902

Country: North America > United States > California > San Diego County (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)