Planning & Scheduling
Proposing Hierarchical Goal-Conditioned Policy Planning in Multi-Goal Reinforcement Learning
Humanoid robots must master numerous tasks with sparse rewards, posing a challenge for reinforcement learning (RL). We propose a method combining RL and automated planning to address this. Our approach uses short goal-conditioned policies (GCPs) organized hierarchically, with Monte Carlo Tree Search (MCTS) planning using high-level actions (HLAs). Instead of primitive actions, the planning process generates HLAs. A single plan-tree, maintained during the agent's lifetime, holds knowledge about goal achievement. This hierarchy enhances sample efficiency and speeds up reasoning by reusing HLAs and anticipating future actions. Our Hierarchical Goal-Conditioned Policy Planning (HGCPP) framework uniquely integrates GCPs, MCTS, and hierarchical RL, potentially improving exploration and planning in complex tasks.
K-ARC: Adaptive Robot Coordination for Multi-Robot Kinodynamic Planning
Qin, Mike, Solis, Irving, Motes, James, Morales, Marco, Amato, Nancy M.
This work presents Kinodynamic Adaptive Robot Coordination (K-ARC), a novel algorithm for multi-robot kinodynamic planning. Our experimental results show the capability of K-ARC to plan for up to 32 planar mobile robots, while achieving up to an order of magnitude of speed-up compared to previous methods in various scenarios. K-ARC is able to achieve this due to its two main properties. First, K-ARC constructs its solution iteratively by planning in segments, where initial kinodynamic paths are found through optimization-based approaches and the inter-robot conflicts are resolved through sampling-based approaches. The interleaving use of sampling-based and optimization-based approaches allows K-ARC to leverage the strengths of both approaches in different sections of the planning process where one is more suited than the other, while previous methods tend to emphasize on one over the other. Second, K-ARC builds on a previously proposed multi-robot motion planning framework, Adaptive Robot Coordination (ARC), and inherits its strength of focusing on coordination between robots only when needed, saving computation efforts. We show how the combination of these two properties allows K-ARC to achieve overall better performance in our simulated experiments with increasing numbers of robots, increasing degrees of problem difficulties, and increasing complexities of robot dynamics.
Predicate Invention from Pixels via Pretrained Vision-Language Models
Athalye, Ashay, Kumar, Nishanth, Silver, Tom, Liang, Yichao, Lozano-Pérez, Tomás, Kaelbling, Leslie Pack
Our aim is to learn to solve long-horizon decision-making problems in highly-variable, combinatorially-complex robotics domains given raw sensor input in the form of images. Previous work has shown that one way to achieve this aim is to learn a structured abstract transition model in the form of symbolic predicates and operators, and then plan within this model to solve novel tasks at test time. However, these learned models do not ground directly into pixels from just a handful of demonstrations. In this work, we propose to invent predicates that operate directly over input images by leveraging the capabilities of pretrained vision-language models (VLMs). Our key idea is that, given a set of demonstrations, a VLM can be used to propose a set of predicates that are potentially relevant for decision-making and then to determine the truth values of these predicates in both the given demonstrations and new image inputs. We build upon an existing framework for predicate invention, which generates feature-based predicates operating on object-centric states, to also generate visual predicates that operate on images. Experimentally, we show that our approach -- pix2pred -- is able to invent semantically meaningful predicates that enable generalization to novel, complex, and long-horizon tasks across two simulated robotic environments.
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search
Yao, Huanjin, Huang, Jiaxing, Wu, Wenhao, Zhang, Jingyi, Wang, Yibo, Liu, Shunyu, Wang, Yingjie, Song, Yuxin, Feng, Haocheng, Shen, Li, Tao, Dacheng
In this work, we aim to develop an MLLM that understands and solves questions by learning to create each intermediate step of the reasoning involved till the final answer. To this end, we propose Collective Monte Carlo Tree Search (CoMCTS), a new learning-to-reason method for MLLMs, which introduces the concept of collective learning into ``tree search'' for effective and efficient reasoning-path searching and learning. The core idea of CoMCTS is to leverage collective knowledge from multiple models to collaboratively conjecture, search and identify effective reasoning paths toward correct answers via four iterative operations including Expansion, Simulation and Error Positioning, Backpropagation, and Selection. Using CoMCTS, we construct Mulberry-260k, a multimodal dataset with a tree of rich, explicit and well-defined reasoning nodes for each question. With Mulberry-260k, we perform collective SFT to train our model, Mulberry, a series of MLLMs with o1-like step-by-step Reasoning and Reflection capabilities. Extensive experiments demonstrate the superiority of our proposed methods on various benchmarks. Code will be available at https://github.com/HJYao00/Mulberry
Goal Recognition using Actor-Critic Optimization
Nageris, Ben, Meneguzzi, Felipe, Mirsky, Reuth
Goal Recognition aims to infer an agent's goal from a sequence of observations. Existing approaches often rely on manually engineered domains and discrete representations. Deep Recognition using Actor-Critic Optimization (DRACO) is a novel approach based on deep reinforcement learning that overcomes these limitations by providing two key contributions. First, it is the first goal recognition algorithm that learns a set of policy networks from unstructured data and uses them for inference. Second, DRACO introduces new metrics for assessing goal hypotheses through continuous policy representations. DRACO achieves state-of-the-art performance for goal recognition in discrete settings while not using the structured inputs used by existing approaches. Moreover, it outperforms these approaches in more challenging, continuous settings at substantially reduced costs in both computing and memory. Together, these results showcase the robustness of the new algorithm, bridging traditional goal recognition and deep reinforcement learning.
Real-Time Sampling-Based Safe Motion Planning for Robotic Manipulators in Dynamic Environments
Covic, Nermin, Lacevic, Bakir, Osmankovic, Dinko, Uzunovic, Tarik
In this paper, we present the main features of Dynamic Rapidly-exploring Generalized Bur Tree (DRGBT) algorithm, a sampling-based planner for dynamic environments. We provide a detailed time analysis and appropriate scheduling to facilitate a real-time operation. To this end, an extensive analysis is conducted to identify the time-critical routines and their dependence on the number of obstacles. Furthermore, information about the distance to obstacles is used to compute a structure called dynamic expanded bubble of free configuration space, which is then utilized to establish sufficient conditions for a guaranteed safe motion of the robot while satisfying all kinematic constraints. An extensive randomized simulation trial is conducted to compare the proposed algorithm to a competing state-of-the-art method. Finally, an experimental study on a real robot is carried out covering a variety of scenarios including those with human presence. The results show the effectiveness and feasibility of real-time execution of the proposed motion planning algorithm within a typical sensor-based arrangement, using cheap hardware and sequential architecture, without the necessity for GPUs or heavy parallelization.
STITCHER: Real-Time Trajectory Planning with Motion Primitive Search
Levy, Helene J., Lopez, Brett T.
Autonomous high-speed navigation through large, complex environments requires real-time generation of agile trajectories that are dynamically feasible, collision-free, and satisfy state or actuator constraints. Most modern trajectory planning techniques rely on numerical optimization because high-quality, expressive trajectories that satisfy various constraints can be systematically computed. However, meeting computation time constraints and the potential for numerical instabilities can limit the use of optimization-based planners in safety-critical scenarios. This work presents an optimization-free planning framework that stitches short trajectory segments together with graph search to compute long range, expressive, and near-optimal trajectories in real-time. Our STITCHER algorithm is shown to outperform modern optimization-based planners through our innovative planning architecture and several algorithmic developments that make real-time planning possible. Extensive simulation testing is conducted to analyze the algorithmic components that make up STITCHER, and a thorough comparison with two state-of-the-art optimization planners is performed. It is shown STITCHER can generate trajectories through complex environments over long distances (tens of meters) with low computation times (milliseconds).
The Importance of Adaptive Decision-Making for Autonomous Long-Range Planetary Surface Mobility
Lamarre, Olivier, Kelly, Jonathan
Long-distance driving is an important component of planetary surface exploration. Unforeseen events often require human operators to adjust mobility plans, but this approach does not scale and will be insufficient for future missions. Interest in self-reliant rovers is increasing, however the research community has not yet given significant attention to autonomous, adaptive decision-making. In this paper, we look back at specific planetary mobility operations where human-guided adaptive planning played an important role in mission safety and productivity. Inspired by the abilities of human experts, we identify shortcomings of existing autonomous mobility algorithms for robots operating in off-road environments like planetary surfaces. We advocate for adaptive decision-making capabilities such as unassisted learning from past experiences and more reliance on stochastic world models. The aim of this work is to highlight promising research avenues to enhance ground planning tools and, ultimately, long-range autonomy algorithms on board planetary rovers.
A survey on pioneering metaheuristic algorithms between 2019 and 2024
Dokeroglu, Tansel, Canturk, Deniz, Kucukyilmaz, Tayfun
With innovation accelerating, selecting the most effective algorithms has become increasingly demanding for researchers and practitioners alike. Recognizing this, we conducted an in-depth review of metaheuristics introduced in the past six years, focusing on their influence and effectiveness. We evaluated these algorithms across essential criteria: citation frequency, diversity in tackled problem types, code availability, ease of parameter tuning, introduction of novel mechanisms, and resilience to issues like stagnation and early convergence. Out of 158 algorithms, we identified 23 that set themselves apart, each contributing unique solutions to long-standing optimization challenges. These algorithms stand out for their versatility and innovation, positioning them as valuable assets for advancing research and addressing complex real-world problems. Our review offers a detailed analysis of these algorithms, comparing their strengths, limitations, similarities, and applications, while highlighting promising trends and future pathways in metaheuristic research.
Hybrid Feedback Control for Global Navigation with Locally Optimal Obstacle Avoidance in n-Dimensional Spaces
Cheniouni, Ishak, Berkane, Soulaimane, Tayebi, Abdelhamid
We present a hybrid feedback control framework for autonomous robot navigation in n-dimensional Euclidean spaces cluttered with spherical obstacles. The proposed approach ensures safe navigation and global asymptotic stability (GAS) of the target location by dynamically switching between two operational modes: motion-to-destination and locally optimal obstacle-avoidance. It produces continuous velocity inputs, ensures collision-free trajectories and generates locally optimal obstacle avoidance maneuvers. Unlike existing methods, the proposed framework is compatible with range sensors, enabling navigation in both a priori known and unknown environments. Extensive simulations in 2D and 3D settings, complemented by experimental validation on a TurtleBot 4 platform, confirm the efficacy and robustness of the approach. Our results demonstrate shorter paths and smoother trajectories compared to state-of-the-art methods, while maintaining computational efficiency and real-world feasibility.