Goto

Collaborating Authors

 kaelbling




Belief-DependentMacro-ActionDiscovery inPOMDPsusingtheValueofInformation

Neural Information Processing Systems

This property can be observed directly from Eq. 2 when integration is replaced by summation. Closed-loop First, we construct the standard, closed-loopα-vectors, which represent the value function under closed loop dynamics [1,5]. Each point in the scatter plot represents a paired experiment with identical target dynamics.



Kinodynamic Task and Motion Planning using VLM-guided and Interleaved Sampling

Kwon, Minseo, Kim, Young J.

arXiv.org Artificial Intelligence

Abstract-- T ask and Motion Planning (T AMP) integrates high-level task planning with low-level motion feasibility, but existing methods are costly in long-horizon problems due to excessive motion sampling. While LLMs provide commonsense priors, they lack 3D spatial reasoning and cannot ensure geometric or dynamic feasibility. We propose a kinodynamic T AMP framework based on a hybrid state tree that uniformly represents symbolic and numeric states during planning, enabling task and motion decisions to be jointly decided. Kinodynamic constraints embedded in the T AMP problem are verified by an off-the-shelf motion planner and physics simulator, and a VLM guides exploring a T AMP solution and backtracks the search based on visual rendering of the states. I. INTRODUCTION Robotic manipulation tasks, such as tabletop manipulations, require reasoning over both symbolic task decisions and continuous geometric feasibility. A robot must decide which action to perform--such as picking, placing, or stacking-- and which object to grasp, which constitutes a discrete search process. Simultaneously, it must determine grasp poses, feasible end-effector configurations, and collision-free motion trajectories governed by continuous constraints. This class of problems is studied under the framework of Task and Motion Planning (i.e., T AMP), which combines high-level task planning with continuous action parameter binding and low-level motion planning [1], [2].


Learning to Plan & Schedule with Reinforcement-Learned Bimanual Robot Skills

Wan, Weikang, Ramos, Fabio, Yang, Xuning, Garrett, Caelan

arXiv.org Artificial Intelligence

Long-horizon contact-rich bimanual manipulation presents a significant challenge, requiring complex coordination involving a mixture of parallel execution and sequential collaboration between arms. In this paper, we introduce a hierarchical framework that frames this challenge as an integrated skill planning & scheduling problem, going beyond purely sequential decision-making to support simultaneous skill invocation. Our approach is built upon a library of single-arm and bimanual primitive skills, each trained using Reinforcement Learning (RL) in GPU-accelerated simulation. We then train a Transformer-based planner on a dataset of skill compositions to act as a high-level scheduler, simultaneously predicting the discrete schedule of skills as well as their continuous parameters. We demonstrate that our method achieves higher success rates on complex, contact-rich tasks than end-to-end RL approaches and produces more efficient, coordinated behaviors than traditional sequential-only planners.


Transitive RL: Value Learning via Divide and Conquer

Park, Seohong, Oberai, Aditya, Atreya, Pranav, Levine, Sergey

arXiv.org Artificial Intelligence

In this work, we present Transitive Reinforcement Learning (TRL), a new value learning algorithm based on a divide-and-conquer paradigm. TRL is designed for offline goal-conditioned reinforcement learning (GCRL) problems, where the aim is to find a policy that can reach any state from any other state in the smallest number of steps. TRL converts a triangle inequality structure present in GCRL into a practical divide-and-conquer value update rule. This has several advantages compared to alternative value learning paradigms. Compared to temporal difference (TD) methods, TRL suffers less from bias accumulation, as in principle it only requires $O(\log T)$ recursions (as opposed to $O(T)$ in TD learning) to handle a length-$T$ trajectory. Unlike Monte Carlo methods, TRL suffers less from high variance as it performs dynamic programming. Experimentally, we show that TRL achieves the best performance in highly challenging, long-horizon benchmark tasks compared to previous offline GCRL algorithms.


Optimistic Reinforcement Learning-Based Skill Insertions for Task and Motion Planning

Liu, Gaoyuan, de Winter, Joris, Durodie, Yuri, Steckelmacher, Denis, Nowe, Ann, Vanderborght, Bram

arXiv.org Artificial Intelligence

Abstract--T ask and motion planning (T AMP) for robotics manipulation necessitates long-horizon reasoning involving versatile actions and skills. While deterministic actions can be crafted by sampling or optimizing with certain constraints, planning actions with uncertainty, i.e., probabilistic actions, remains a challenge for T AMP . On the contrary, Reinforcement Learning (RL) excels in acquiring versatile, yet short-horizon, manipulation skills that are robust with uncertainties. Besides the policy, a RL skill is defined with data-driven logical components that enable the skill to be deployed by symbolic planning. A plan refinement sub-routine is designed to further tackle the inevitable effect uncertainties. In the experiments, we compare our method with baseline hierarchical planning from both T AMP and RL fields and illustrate the strength of the method. The results show that by embedding RL skills, we extend the capability of T AMP to domains with probabilistic skills, and improve the planning efficiency compared to the previous methods. Reinforcement Learning (RL) empowers robots to acquire manipulation skills without human programming. However, prior works mostly tackle single-skill or short-term manipulation tasks, such as grasping [1] or peg insertion [2] or synergies between two actions [3]. The long-horizon manipulation planning remains a challenge in the RL field because of expanding state/action spaces and sparse rewards etc [4].


Few-Shot Neuro-Symbolic Imitation Learning for Long-Horizon Planning and Acting

Lorang, Pierrick, Lu, Hong, Huemer, Johannes, Zips, Patrik, Scheutz, Matthias

arXiv.org Artificial Intelligence

Imitation learning enables intelligent systems to acquire complex behaviors with minimal supervision. However, existing methods often focus on short-horizon skills, require large datasets, and struggle to solve long-horizon tasks or generalize across task variations and distribution shifts. We propose a novel neuro-symbolic framework that jointly learns continuous control policies and symbolic domain abstractions from a few skill demonstrations. Our method abstracts high-level task structures into a graph, discovers symbolic rules via an Answer Set Programming solver, and trains low-level controllers using diffusion policy imitation learning. A high-level oracle filters task-relevant information to focus each controller on a minimal observation and action space. Our graph-based neuro-symbolic framework enables capturing complex state transitions, including non-spatial and temporal relations, that data-driven learning or clustering techniques often fail to discover in limited demonstration datasets. We validate our approach in six domains that involve four robotic arms, Stacking, Kitchen, Assembly, and Towers of Hanoi environments, and a distinct Automated Forklift domain with two environments. The results demonstrate high data efficiency with as few as five skill demonstrations, strong zero- and few-shot generalizations, and interpretable decision making.


Fast Task Planning with Neuro-Symbolic Relaxation

Du, Qiwei, Li, Bowen, Du, Yi, Su, Shaoshu, Fu, Taimeng, Zhan, Zitong, Zhao, Zhipeng, Wang, Chen

arXiv.org Artificial Intelligence

Real-world task planning requires long-horizon reasoning over large sets of entities with complex relationships and attributes, leading to a combinatorial explosion for classical symbolic planners. To prune the search space, recent methods prioritize searching on a simplified task only containing a few "important" entities predicted by a neural network. However, such a simple neuro-symbolic (NeSy) integration risks omitting critical entities and wasting resources on unsolvable simplified tasks. To enable Fast and reliable planning, we introduce a NeSy relaxation strategy (Flax), combining neural importance prediction with symbolic expansion. Specifically, we first learn a graph neural network to predict entity importance to create a simplified task and solve it with a symbolic planner. Then, we solve a rule-relaxed task to obtain a quick rough plan, and reintegrate all referenced entities into the simplified task to recover any overlooked but essential elements. Finally, we apply complementary rules to refine the updated task, keeping it both reliable and compact. Extensive experiments are conducted on both synthetic and real-world maze navigation benchmarks where a robot must traverse through a maze and interact with movable objects. The results show that Flax boosts the average success rate by 20.82% and cuts mean wall-clock planning time by 17.65% compared with the state-of-the-art NeSy baseline. We expect that Flax offers a practical path toward fast, scalable, long-horizon task planning in complex environments.