Planning & Scheduling
Concurrent-Allocation Task Execution for Multi-Robot Path-Crossing-Minimal Navigation in Obstacle Environments
Hu, Bin-Bin, Yao, Weijia, Zhou, Yanxin, Wei, Henglai, Lv, Chen
Reducing undesirable path crossings among trajectories of different robots is vital in multi-robot navigation missions, which not only reduces detours and conflict scenarios, but also enhances navigation efficiency and boosts productivity. Despite recent progress in multi-robot path-crossing-minimal (MPCM) navigation, the majority of approaches depend on the minimal squared-distance reassignment of suitable desired points to robots directly. However, if obstacles occupy the passing space, calculating the actual robot-point distances becomes complex or intractable, which may render the MPCM navigation in obstacle environments inefficient or even infeasible. In this paper, the concurrent-allocation task execution (CATE) algorithm is presented to address this problem (i.e., MPCM navigation in obstacle environments). First, the path-crossing-related elements in terms of (i) robot allocation, (ii) desired-point convergence, and (iii) collision and obstacle avoidance are encoded into integer and control barrier function (CBF) constraints. Then, the proposed constraints are used in an online constrained optimization framework, which implicitly yet effectively minimizes the possible path crossings and trajectory length in obstacle environments by minimizing the desired point allocation cost and slack variables in CBF constraints simultaneously. In this way, the MPCM navigation in obstacle environments can be achieved with flexible spatial orderings. Note that the feasibility of solutions and the asymptotic convergence property of the proposed CATE algorithm in obstacle environments are both guaranteed, and the calculation burden is also reduced by concurrently calculating the optimal allocation and the control input directly without the path planning process.
A Framework for the Systematic Evaluation of Obstacle Avoidance and Object-Aware Controllers
Escobedo, Caleb, Nechyporenko, Nataliya, Kadekodi, Shreyas, Roncone, Alessandro
Personal use of this material is permitted. Abstract-- Real-time control is an essential aspect of safe robot operation in the real world with dynamic objects. We present a framework for the analysis of object-aware controllers, methods for altering a robot's motion to anticipate and avoid possible collisions. This framework is focused on three design considerations: kinematics, motion profiles, and virtual constraints. Additionally, the analysis in this work relies on verification of robot behaviors using fundamental robot-obstacle experimental scenarios. T o showcase the effectiveness of our method we compare three representative object-aware controllers. The comparison uses metrics originating from the design considerations. From the analysis, we find that the design of object-aware controllers often lacks kinematic considerations, continuity of control points, and stability in movement profiles. We conclude that this framework can be used in the future to design, compare, and benchmark obstacle avoidance methods.
Investigating Intra-Abstraction Policies For Non-exact Abstraction Algorithms
Schmรถcker, Robin, Dockhorn, Alexander, Rosenhahn, Bodo
One weakness of Monte Carlo Tree Search (MCTS) is its sample efficiency which can be addressed by building and using state and/or action abstractions in parallel to the tree search such that information can be shared among nodes of the same layer. The primary usage of abstractions for MCTS is to enhance the Upper Confidence Bound (UCB) value during the tree policy by aggregating visits and returns of an abstract node. However, this direct usage of abstractions does not take the case into account where multiple actions with the same parent might be in the same abstract node, as these would then all have the same UCB value, thus requiring a tiebreak rule. In state-of-the-art abstraction algorithms such as pruned On the Go Abstractions (pruned OGA), this case has not been noticed, and a random tiebreak rule was implicitly chosen. In this paper, we propose and empirically evaluate several alternative intra-abstraction policies, several of which outperform the random policy across a majority of environments and parameter settings.
LagMemo: Language 3D Gaussian Splatting Memory for Multi-modal Open-vocabulary Multi-goal Visual Navigation
Zhou, Haotian, Wang, Xiaole, Li, He, Sun, Fusheng, Guo, Shengyu, Qi, Guolei, Xu, Jianghuan, Zhao, Huijing
Abstract-- Navigating to a designated goal using visual information is a fundamental capability for intelligent robots. Most classical visual navigation methods are restricted to single-goal, single-modality, and closed set goal settings. T o address the practical demands of multi-modal, open-vocabulary goal queries and multi-goal visual navigation, we propose LagMemo, a navigation system that leverages a language 3D Gaussian Splatting memory. With incoming task goals, the system queries the memory, predicts candidate goal locations, and integrates a local perception-based verification mechanism to dynamically match and validate goals during navigation. For fair and rigorous evaluation, we curate GOA T -Core, a high-quality core split distilled from GOA T - Bench tailored to multi-modal open-vocabulary multi-goal visual navigation. Experimental results show that LagMemo's memory module enables effective multi-modal open-vocabulary goal localization, and that LagMemo outperforms state-of-the-art methods in multi-goal visual navigation. I. INTRODUCTION In real-world applications such as home assistants and service robots, mobile agents are expected to understand user instructions, perceive environments, and navigate to target objects [1][2]. With the advancement of vision-language models [3][4], and inspired by the fact that humans primarily rely on vision to navigate, visual navigation has emerged as a prominent research area [1][5].
Balanced Collaborative Exploration via Distributed Topological Graph Voronoi Partition
Ding, Tianyi, Zheng, Ronghao, Zhang, Senlin, Liu, Meiqin
Abstract--This work addresses the collaborative multi-robot autonomous online exploration problem, particularly focusing on distributed exploration planning for dynamically balanced exploration area partition and task allocation among a team of mobile robots operating in obstacle-dense non-convex environments. We present a novel topological map structure that simultaneously characterizes both spatial connectivity and global exploration completeness of the environment. The topological map is updated incrementally to utilize known spatial information for updating reachable spaces, while exploration targets are planned in a receding horizon fashion under global coverage guidance. A distributed weighted topological graph V oronoi algorithm is introduced implementing balanced graph space partitions of the fused topological maps. Theoretical guarantees are provided for distributed consensus convergence and equitable graph space partitions with constant bounds. A local planner optimizes the visitation sequence of exploration targets within the balanced partitioned graph space to minimize travel distance, while generating safe, smooth, and dynamically feasible motion trajectories. Comprehensive benchmarking against state-of-the-art methods demonstrates significant improvements in exploration efficiency, completeness, and workload balance across the robot team. Autonomous exploration via multi-robot systems, which leverages robotic systems to map unknown environments cooperatively, is a critical capability for applications such as inspection, search-and-rescue, and disaster response [1], [2], [3]. Multi-robot systems offer substantial advantages, including accelerated exploration and enhanced fault tolerance. Despite their potential, developing robust and efficient multi-robot exploration systems remains challenging due to suboptimal task allocation, and inefficient coordination strategies. Previous collaborative exploration approaches often rely on centralized controllers [4], [5], which are impractical in real-world scenarios with unreliable or range-limited connectivity. Decentralized coordination methods have been proposed to mitigate these issues [6], [7], [8] yet many multi-robot exploration approaches still suffer from critical inefficiencies.
Motivating Students' Self-study with Goal Reminder and Emotional Support
Cho, Hyung Chan, Cha, Go-Eum, Liu, Yanfu, Jeong, Sooyeon
Abstract-- While the efficacy of social robots in supporting people in learning tasks has been extensively investigated, their potential impact in assisting students in self-studying contexts has not been investigated much. This study explores how a social robot can act as a peer study companion for college students during self-study tasks by delivering task-oriented goal reminder and positive emotional support. We conducted an exploratory Wizard-of-Oz study to explore how these robotic support behaviors impacted students' perceived focus, productivity, and engagement in comparison to a robot that only provided physical presence (control). Our study results suggest that participants in the goal reminder and the emotional support conditions reported greater ease of use, with the goal reminder condition additionally showing a higher willingness to use the robot in future study sessions. Participants' satisfaction with the robot was correlated with their perception of the robot as a social other, and this perception was found to be a predictor for their level of goal achievement in the self-study task. These findings highlight the potential of socially assistive robots to support self-study through both functional and emotional engagement. Peer relationships in educational settings play a crucial role in generating relatedness and support that are influential in fostering academic success [1]-[4]. Peer support is shown to positively impact students' learning by fostering a sense of connectedness, which enhances productivity, academic performance, and study well-being [1], [3], [5], [6].
Planning Ahead with RSA: Efficient Signalling in Dynamic Environments by Projecting User Awareness across Future Timesteps
Das, Anwesha, Duff, John, Hoffmann, Jรถrg, Demberg, Vera
Adaptive agent design offers a way to improve human-AI collaboration on time-sensitive tasks in rapidly changing environments. In such cases, to ensure the human maintains an accurate understanding of critical task elements, an assistive agent must not only identify the highest priority information but also estimate how and when this information can be communicated most effectively, given that human attention represents a zero-sum cognitive resource where focus on one message diminishes awareness of other or upcoming information. We introduce a theoretical framework for adaptive signalling which meets these challenges by using principles of rational communication, formalised as Bayesian reference resolution using the Rational Speech Act (RSA) modelling framework, to plan a sequence of messages which optimise timely alignment between user belief and a dynamic environment. The agent adapts message specificity and timing to the particulars of a user and scenario based on projections of how prior-guided interpretation of messages will influence attention to the interface and subsequent belief update, across several timesteps out to a fixed horizon. In a comparison to baseline methods, we show that this effectiveness depends crucially on combining multi-step planning with a realistic model of user awareness. As the first application of RSA for communication in a dynamic environment, and for human-AI interaction in general, we establish theoretical foundations for pragmatic communication in human-agent teams, highlighting how insights from cognitive science can be capitalised to inform the design of assistive agents.
AUPO -- Abstracted Until Proven Otherwise: A Reward Distribution Based Abstraction Algorithm
Schmรถcker, Robin, Dockhorn, Alexander, Rosenhahn, Bodo
We introduce a novel, drop-in modification to Monte Carlo Tree Search's (MCTS) decision policy that we call AUPO. Comparisons based on a range of IPPC benchmark problems show that AUPO clearly outperforms MCTS. AUPO is an automatic action abstraction algorithm that solely relies on reward distribution statistics acquired during the MCTS. Thus, unlike other automatic abstraction algorithms, AUPO requires neither access to transition probabilities nor does AUPO require a directed acyclic search graph to build its abstraction, allowing AUPO to detect symmetric actions that state-of-the-art frameworks like ASAP struggle with when the resulting symmetric states are far apart in state space. Furthermore, as AUPO only affects the decision policy, it is not mutually exclusive with other abstraction techniques that only affect the tree search.
PIP-LLM: Integrating PDDL-Integer Programming with LLMs for Coordinating Multi-Robot Teams Using Natural Language
Shi, Guangyao, Wu, Yuwei, Kumar, Vijay, Sukhatme, Gaurav S.
Abstract-- Enabling robot teams to execute natural language commands requires translating high-level instructions into feasible, efficient multi-robot plans. While Large Language Models (LLMs) combined with Planning Domain Description Language (PDDL) offer promise for single-robot scenarios, existing approaches struggle with multi-robot coordination due to brittle task decomposition, poor scalability, and low coordination efficiency. We introduce PIP-LLM, a language-based coordination framework that consists of PDDL-based team-level planning and Integer Programming (IP) based robot-level planning. PIP-LLMs first decomposes the command by translating the command into a team-level PDDL problem and solves it to obtain a team-level plan, abstracting away robot assignment. Each team-level action represents a subtask to be finished by the team. Next, this plan is translated into a dependency graph representing the subtasks' dependency structure. Such a dependency graph is then used to guide the robot-level planning, in which each subtask node will be formulated as an IP-based task allocation problem, explicitly optimizing travel costs and workload while respecting robot capabilities and user-defined constraints. This separation of planning from assignment allows PIP-LLM to avoid the pitfalls of syntax-based decomposition and scale to larger teams. Experiments across diverse tasks show that PIP-LLM improves plan success rate, reduces maximum and average travel costs, and achieves better load balancing compared to state-of-the-art baselines. A long-standing goal in multi-robot research is to create systems that take concise human language commands and automatically generate executable coordination plans for robot teams.
Top-Down Semantic Refinement for Image Captioning
Zhang, Jusheng, Cai, Kaitong, Yang, Jing, Wang, Jian, Tang, Chengpei, Wang, Keze
Large Vision-Language Models (VLMs) face an inherent contradiction in image captioning: their powerful single-step generation capabilities often lead to a myopic decision-making process. This makes it difficult to maintain global narrative coherence while capturing rich details, a limitation that is particularly pronounced in tasks that require multi-step and complex scene description. To overcome this fundamental challenge, we redefine image captioning as a goal-oriented hierarchical refinement planning problem, and further propose a novel framework, named Top-Down Semantic Refinement (TDSR), which models the generation process as a Markov Decision Process (MDP). However, planning within the vast state space of a VLM presents a significant computational hurdle. Our core contribution, therefore, is the design of a highly efficient Monte Carlo Tree Search (MCTS) algorithm tailored for VLMs. By incorporating a visual-guided parallel expansion and a lightweight value network, our TDSR reduces the call frequency to the expensive VLM by an order of magnitude without sacrificing planning quality. Furthermore, an adaptive early stopping mechanism dynamically matches computational overhead to the image's complexity. Extensive experiments on multiple benchmarks, including DetailCaps, COMPOSITION-CAP, and POPE, demonstrate that our TDSR, as a plug-and-play module, can significantly enhance the performance of existing VLMs (e.g., LLaV A-1.5, Qwen2.5-VL) by achieving state-of-the-art or highly competitive results in fine-grained description, compositional generalization, and hallucination suppression.