Goto

Collaborating Authors

 environmental change


BINDER: Instantly Adaptive Mobile Manipulation with Open-Vocabulary Commands

Cho, Seongwon, Ahn, Daechul, Shin, Donghyun, Choi, Hyeonbeom, Kim, San, Choi, Jonghyun

arXiv.org Artificial Intelligence

Open-vocabulary mobile manipulation (OVMM) requires robots to follow language instructions, navigate, and manipulate while updating their world representation under dynamic environmental changes. However, most prior approaches update their world representation only at discrete update points such as navigation targets, waypoints, or the end of an action step, leaving robots blind between updates and causing cascading failures: overlooked objects, late error detection, and delayed replanning. To address this limitation, we propose BINDER (Bridging INstant and DEliberative Reasoning), a dual process framework that decouples strategic planning from continuous environment monitoring. Specifically, BINDER integrates a Deliberative Response Module (DRM, a multimodal LLM for task planning) with an Instant Response Module (IRM, a VideoLLM for continuous monitoring). The two modules play complementary roles: the DRM performs strategic planning with structured 3D scene updates and guides what the IRM attends to, while the IRM analyzes video streams to update memory, correct ongoing actions, and trigger replanning when necessary. Through this bidirectional coordination, the modules address the trade off between maintaining awareness and avoiding costly updates, enabling robust adaptation under dynamic conditions. Evaluated in three real world environments with dynamic object placement, BINDER achieves substantially higher success and efficiency than SoTA baselines, demonstrating its effectiveness for real world deployment.


Empirical Comparison of Forgetting Mechanisms for UCB-based Algorithms on a Data-Driven Simulation Platform

Chen, Minxin

arXiv.org Artificial Intelligence

Many real-world bandit problems involve non-stationary reward distributions, where the optimal decision may shift due to evolving environments. However, the performance of some typical Multi-Armed Bandit (MAB) models such as Upper Confidence Bound (UCB) algorithms degrades significantly in non-stationary environments where reward distributions change over time. To address this limitation, this paper introduces and evaluates FDSW-UCB, a novel dual-view algorithm that integrates a discount-based long-term perspective with a sliding-window-based short-term view. A data-driven semi-synthetic simulation platform, built upon the MovieLens-1M and Open Bandit datasets, is developed to test algorithm adaptability under abrupt and gradual drift scenarios. Experimental results demonstrate that a well-configured sliding-window mechanism (SW-UCB) is robust, while the widely used discounting method (D-UCB) suffers from a fundamental learning failure, leading to linear regret. Crucially, the proposed FDSW-UCB, when employing an optimistic aggregation strategy, achieves superior performance in dynamic settings, highlighting that the ensemble strategy itself is a decisive factor for success.


LT-Exosense: A Vision-centric Multi-session Mapping System for Lifelong Safe Navigation of Exoskeletons

Wang, Jianeng, Mattamala, Matias, Kassab, Christina, Chebrolu, Nived, Burger, Guillaume, Elnecave, Fabio, Petriaux, Marine, Fallon, Maurice

arXiv.org Artificial Intelligence

Figure 1: L T -Exosense is capable of merging multiple sessions generated by a previous work, Exosense, a vision-centric scene understanding system with its sensing unit (T op-Right) integrated into a self-balancing exoskeleton (b). The merged map (a) contains five sessions with colored contours indicating the coverage area of each session. Such a merged map can be further converted into a navigation map, enabling obstacle-free planning spanning multiple sessions. Abstract-- Self-balancing exoskeletons offer a promising mobility solution for individuals with lower-limb disabilities. For reliable long-term operation, these exoskeletons require a perception system that is effective in changing environments. In this work, we introduce L T -Exosense, a vision-centric, multi-session mapping system designed to support long-term (semi)- autonomous navigation for exoskeleton users. L T -Exosense extends single-session mapping capabilities by incrementally fusing spatial knowledge across multiple sessions, detecting environmental changes, and updating a persistent global map. This representation enables intelligent path planning, which can adapt to newly observed obstacles and can recover previous routes when obstructions are removed. We validate L T -Exosense through several real-world experiments, demonstrating a scalable multi-session map that achieves an average point-to-point error below 5 cm when compared to ground-truth laser scans.


Sample Efficient Experience Replay in Non-stationary Environments

Duan, Tianyang, Zhang, Zongyuan, Guo, Songxiao, Zhao, Yuanye, Lin, Zheng, Fang, Zihan, Liu, Yi, Luan, Dianxin, Huang, Dong, Cui, Heming, Cui, Yong

arXiv.org Artificial Intelligence

Reinforcement learning (RL) in non-stationary environments is challenging, as changing dynamics and rewards quickly make past experiences outdated. Traditional experience replay (ER) methods, especially those using TD-error prioritization, struggle to distinguish between changes caused by the agent's policy and those from the environment, resulting in inefficient learning under dynamic conditions. To address this challenge, we propose the Discrepancy of Environment Dynamics (DoE), a metric that isolates the effects of environment shifts on value functions. Building on this, we introduce Discrepancy of Environment Prioritized Experience Replay (DEER), an adaptive ER framework that prioritizes transitions based on both policy updates and environmental changes. DEER uses a binary classifier to detect environment changes and applies distinct prioritization strategies before and after each shift, enabling more sample-efficient learning. Experiments on four non-stationary benchmarks demonstrate that DEER further improves the performance of off-policy algorithms by 11.54 percent compared to the best-performing state-of-the-art ER methods.


An Adaptive Coverage Control Approach for Multiple Autonomous Off-road Vehicles in Dynamic Agricultural Fields

Ahmadi, Sajad, Davoodi, Mohammadreza, Velni, Javad Mohammadpour

arXiv.org Artificial Intelligence

This paper presents an adaptive coverage control method for a fleet of off-road and Unmanned Ground Vehicles (UGVs) operating in dynamic (time-varying) agricultural environments. Traditional coverage control approaches often assume static conditions, making them unsuitable for real-world farming scenarios where obstacles, such as moving machinery and uneven terrains, create continuous challenges. To address this, we propose a real-time path planning framework that integrates Unmanned Aerial Vehicles (UAVs) for obstacle detection and terrain assessment, allowing UGVs to dynamically adjust their coverage paths. The environment is modeled as a weighted directed graph, where the edge weights are continuously updated based on the UAV observations to reflect obstacle motion and terrain variations. The proposed approach incorporates Voronoi-based partitioning, adaptive edge weight assignment, and cost-based path optimization to enhance navigation efficiency. Simulation results demonstrate the effectiveness of the proposed method in improving path planning, reducing traversal costs, and maintaining robust coverage in the presence of dynamic obstacles and muddy terrains.


When and Where Localization Fails: An Analysis of the Iterative Closest Point in Evolving Environment

Dannaoui, Abdel-Raouf, Laconte, Johann, Debain, Christophe, Pomerleau, Francois, Checchin, Paul

arXiv.org Artificial Intelligence

Robust relocalization in dynamic outdoor environments remains a key challenge for autonomous systems relying on 3D lidar. While long-term localization has been widely studied, short-term environmental changes, occurring over days or weeks, remain underexplored despite their practical significance. To address this gap, we present a highresolution, short-term multi-temporal dataset collected weekly from February to April 2025 across natural and semi-urban settings. Each session includes high-density point cloud maps, 360 deg panoramic images, and trajectory data. Projected lidar scans, derived from the point cloud maps and modeled with sensor-accurate occlusions, are used to evaluate alignment accuracy against the ground truth using two Iterative Closest Point (ICP) variants: Point-to-Point and Point-to-Plane. Results show that Point-to-Plane offers significantly more stable and accurate registration, particularly in areas with sparse features or dense vegetation. This study provides a structured dataset for evaluating short-term localization robustness, a reproducible framework for analyzing scan-to-map alignment under noise, and a comparative evaluation of ICP performance in evolving outdoor environments. Our analysis underscores how local geometry and environmental variability affect localization success, offering insights for designing more resilient robotic systems.


Bridging Distribution Shift and AI Safety: Conceptual and Methodological Synergies

Liu, Chenruo, Tang, Kenan, Qin, Yao, Lei, Qi

arXiv.org Artificial Intelligence

This paper bridges distribution shift and AI safety through a comprehensive analysis of their conceptual and methodological synergies. While prior discussions often focus on narrow cases or informal analogies, we establish two types connections between specific causes of distribution shift and fine-grained AI safety issues: (1) methods addressing a specific shift type can help achieve corresponding safety goals, or (2) certain shifts and safety issues can be formally reduced to each other, enabling mutual adaptation of their methods. Our findings provide a unified perspective that encourages fundamental integration between distribution shift and AI safety research.


DriveAgent: Multi-Agent Structured Reasoning with LLM and Multimodal Sensor Fusion for Autonomous Driving

Hou, Xinmeng, Wang, Wuqi, Yang, Long, Lin, Hao, Feng, Jinglun, Min, Haigen, Zhao, Xiangmo

arXiv.org Artificial Intelligence

DriveAgent: Multi-Agent Structured Reasoning with LLM and Multimodal Sensor Fusion for Autonomous Driving Xinmeng Hou 2,, Wuqi Wang 1,, Long Y ang 1, Hao Lin 3, Jinglun Feng 4,, Haigen Min 1,, Xiangmo Zhao 1 Abstract -- We introduce DriveAgent, a novel multi-agent autonomous driving framework that leverages large language model (LLM) reasoning combined with multimodal sensor fusion to enhance situational understanding and decision-making. DriveAgent uniquely integrates diverse sensor modalities--including camera, LiDAR, GPS, and IMU--with LLMdriven analytical processes structured across specialized agents. The framework operates through a modular agent-based pipeline comprising four principal modules: (i) a descriptive analysis agent identifying critical sensor data events based on filtered timestamps, (ii) dedicated vehicle-level analysis conducted by LiDAR and vision agents that collaboratively assess vehicle conditions and movements, (iii) environmental reasoning and causal analysis agents explaining contextual changes and their underlying mechanisms, and (iv) an urgency-aware decision-generation agent prioritizing insights and proposing timely maneuvers. This modular design empowers the LLM to effectively coordinate specialized perception and reasoning agents, delivering cohesive, interpretable insights into complex autonomous driving scenarios. Extensive experiments on challenging autonomous driving datasets demonstrate that DriveAgent is achieving superior performance on multiple metrics against baseline methods. These results validate the efficacy of the proposed LLM-driven multi-agent sensor fusion framework, underscoring its potential to substantially enhance the robustness and reliability of autonomous driving systems. 1 I. I NTRODUCTION Promising progress has been made in autonomous driving (AD) in recent years; however, some challenging problems in AD have yet to be solved, especially under dynamic, mul-timodal environments, such as contextual understanding and interpretability [1]. Commonly adopted AD architectures, whether modular or end-to-end, often struggle to integrate insights across heterogeneous sensor modalities--such as cameras, LiDAR, IMU and GPS--especially in edge cases where visual information is ambiguous or missing [2]. 1 Wuqi Wang, Long Y ang, Haigen Min, and Xiangmo Zhao are with Chang'an University, Xi'an, Shaanxi, China. 2 Xinmeng Hou is with Chang'an University, Xi'an, Shaanxi, China and Agency for Science, Technology and Research (A*ST AR), Singapore.


A Vision for AI-Driven Adaptation of Dynamic AR Content to Users and Environments

Rasch, Julian, Müller, Florian, Chiossi, Francesco

arXiv.org Artificial Intelligence

Augmented Reality (AR) is transforming the way we interact with virtual information in the physical world. By overlaying digital content in real-world environments, AR enables new forms of immersive and engaging experiences. However, existing AR systems often struggle to effectively manage the many interactive possibilities that AR presents. This vision paper speculates on AI-driven approaches for adaptive AR content placement, dynamically adjusting to user movement and environmental changes. By leveraging machine learning methods, such a system would intelligently manage content distribution between AR projections integrated into the external environment and fixed static content, enabling seamless UI layout and potentially reducing users' cognitive load. By exploring the possibilities of AI-driven dynamic AR content placement, we aim to envision new opportunities for innovation and improvement in various industries, from urban navigation and workplace productivity to immersive learning and beyond. This paper outlines a vision for the development of more intuitive, engaging, and effective AI-powered AR experiences.


Dynamic Reinforcement Learning for Actors

Shibata, Katsunari

arXiv.org Artificial Intelligence

Dynamic Reinforcement Learning (Dynamic RL), proposed in this paper, directly controls system dynamics, instead of the actor (action-generating neural network) outputs at each moment, bringing about a major qualitative shift in reinforcement learning (RL) from static to dynamic. The actor is initially designed to generate chaotic dynamics through the loop with its environment, enabling the agent to perform flexible and deterministic exploration. Dynamic RL controls global system dynamics using a local index called "sensitivity," which indicates how much the input neighborhood contracts or expands into the corresponding output neighborhood through each neuron's processing. While sensitivity adjustment learning (SAL) prevents excessive convergence of the dynamics, sensitivity-controlled reinforcement learning (SRL) adjusts them -- to converge more to improve reproducibility around better state transitions with positive TD error and to diverge more to enhance exploration around worse transitions with negative TD error. Dynamic RL was applied only to the actor in an Actor-Critic RL architecture while applying it to the critic remains a challenge. It was tested on two dynamic tasks and functioned effectively without external exploration noise or backward computation through time. Moreover, it exhibited excellent adaptability to new environments, although some problems remain. Drawing parallels between 'exploration' and 'thinking,' the author hypothesizes that "exploration grows into thinking through learning" and believes this RL could be a key technique for the emergence of thinking, including inspiration that cannot be reconstructed from massive existing text data. Finally, despite being presumptuous, the author presents the argument that this research should not proceed due to its potentially fatal risks, aiming to encourage discussion.