Goto

Collaborating Authors

 costmap


Real-time Recognition of Human Interactions from a Single RGB-D Camera for Socially-Aware Robot Navigation

Nguyen, Thanh Long, Nguyen, Duc Phu, Nu, Thanh Thao Ton, Le, Quan, Tran, Thuan Hoang, Phung, Manh Duong

arXiv.org Artificial Intelligence

Social robots play a key role in many applications such as elderly care, home assistant, customer service, and education where they assist, interact, and communicate with humans in a socially intelligent manner. These robots must ensure not only physical safety but also psychological comfort for humans by following social norms. For instance, a robot should avoid disrupting a group conversation when navigating a crowded space as this could be seen as impolite or intrusive. To accomplish this, the robot must not only detect humans but also recognize and interpret their interactions such as conversations, discussions, gatherings, and collaborative activities to adapt its movements accordingly. According to [1, 2], human group interactions are structured into three distinct spaces: (i) o-space, the central region where active participants focus their attention, (ii) p-space, the surrounding area occupied by engaged individuals, and (iii) r-space, the outer region where bystanders or non-participants are positioned. To enable socially aware navigation, recognition algorithms must estimate these spatial regions.


Mars Traversability Prediction: A Multi-modal Self-supervised Approach for Costmap Generation

Xie, Zongwu, Yun, Kaijie, Liu, Yang, Ji, Yiming, Li, Han

arXiv.org Artificial Intelligence

We present a robust multi-modal framework for predicting traversability costmaps for planetary rovers. Our model fuses camera and LiDAR data to produce a bird's-eye-view (BEV) terrain costmap, trained self-supervised using IMU-derived labels. Key updates include a DINOv3-based image encoder, FiLM-based sensor fusion, and an optimization loss combining Huber and smoothness terms. Experimental ablations (removing image color, occluding inputs, adding noise) show only minor changes in MAE/MSE (e.g. MAE increases from ~0.0775 to 0.0915 when LiDAR is sparsified), indicating that geometry dominates the learned cost and the model is highly robust. We attribute the small performance differences to the IMU labeling primarily reflecting terrain geometry rather than semantics and to limited data diversity. Unlike prior work claiming large gains, we emphasize our contributions: (1) a high-fidelity, reproducible simulation environment; (2) a self-supervised IMU-based labeling pipeline; and (3) a strong multi-modal BEV costmap prediction model. We discuss limitations and future work such as domain generalization and dataset expansion.


Trailblazer: Learning offroad costmaps for long range planning

Viswanath, Kasi, Sanchez, Felix, Overbye, Timothy, Gregory, Jason M., Saripalli, Srikanth

arXiv.org Artificial Intelligence

Autonomous navigation in off-road environments remains a significant challenge in field robotics, particularly for Unmanned Ground Vehicles (UGVs) tasked with search and rescue, exploration, and surveillance. Effective long-range planning relies on the integration of onboard perception systems with prior environmental knowledge, such as satellite imagery and LiDAR data. This work introduces Trailblazer, a novel framework that automates the conversion of multi-modal sensor data into costmaps, enabling efficient path planning without manual tuning. Unlike traditional approaches, Trailblazer leverages imitation learning and a differentiable A* planner to learn costmaps directly from expert demonstrations, enhancing adaptability across diverse terrains. The proposed methodology was validated through extensive real-world testing, achieving robust performance in dynamic and complex environments, demonstrating Trailblazer's potential for scalable, efficient autonomous navigation.


Online Adaptive Traversability Estimation through Interaction for Unstructured, Densely Vegetated Environments

Ruetz, Fabio A., Lawrance, Nicholas, Hernández, Emili, Borges, Paulo V. K., Peynot, Thierry

arXiv.org Artificial Intelligence

Navigating densely vegetated environments poses significant challenges for autonomous ground vehicles. Learning-based systems typically use prior and in-situ data to predict terrain traversability but often degrade in performance when encountering out-of-distribution elements caused by rapid environmental changes or novel conditions. This paper presents a novel, lidar-only, online adaptive traversability estimation (TE) method that trains a model directly on the robot using self-supervised data collected through robot-environment interaction. The proposed approach utilises a probabilistic 3D voxel representation to integrate lidar measurements and robot experience, creating a salient environmental model. To ensure computational efficiency, a sparse graph-based representation is employed to update temporarily evolving voxel distributions. Extensive experiments with an unmanned ground vehicle in natural terrain demonstrate that the system adapts to complex environments with as little as 8 minutes of operational data, achieving a Matthews Correlation Coefficient (MCC) score of 0.63 and enabling safe navigation in densely vegetated environments. This work examines different training strategies for voxel-based TE methods and offers recommendations for training strategies to improve adaptability. The proposed method is validated on a robotic platform with limited computational resources (25W GPU), achieving accuracy comparable to offline-trained models while maintaining reliable performance across varied environments.


Robot localization aided by quantum algorithms

Antero, Unai, Sierra, Basilio, Oñativia, Jon, Ruiz, Alejandra, Osaba, Eneko

arXiv.org Artificial Intelligence

Localization is a vital aspect of mobile robotics, enabling robots to navigate their environment efficiently and avoid obstacles. Without localization, mobile robots would be unable to determine their position and orientation, making it challenging to plan a path or make informed decisions about their movement (Olson [2000]). Localization allows mobile robots to create an internal map of their environment, which is essential for tasks such as surveying, manipulation, inspection, and delivery (Huang and Lin [2023]). In fact, localization is what enables mobile robots to perform tasks autonomously, making informed decisions about their actions and movements without human intervention. The quality of localization is heavily dependent on the generation of accurate maps, which is a computationally intensive task. Probabilistic localization methods, such as the Adaptive-Monte Carlo localization (AMCL) algorithm, have been widely used in mobile robotics due to their accuracy and robustness (Kristensen and Jensfelt [2003]). However, these methods can be computationally demanding, especially when dealing with large maps or high-resolution sensor data. AMCL, in particular, uses a combination of sensor data and prior map knowledge to determine the probable location of a robot on a given map, but its computation complexity is proportional to the area of the grid of the map (Alshikh Khalil and Hatem [2022]). Recently, the integration of light detection and ranging (LiDAR) sensors has improved the accuracy of localization methods, but the computational requirements remain a challenge (Huang and Lin [2023]).


SALON: Self-supervised Adaptive Learning for Off-road Navigation

Sivaprakasam, Matthew, Triest, Samuel, Ho, Cherie, Aich, Shubhra, Lew, Jeric, Adu, Isaiah, Wang, Wenshan, Scherer, Sebastian

arXiv.org Artificial Intelligence

Autonomous robot navigation in off-road environments presents a number of challenges due to its lack of structure, making it difficult to handcraft robust heuristics for diverse scenarios. While learned methods using hand labels or self-supervised data improve generalizability, they often require a tremendous amount of data and can be vulnerable to domain shifts. To improve generalization in novel environments, recent works have incorporated adaptation and self-supervision to develop autonomous systems that can learn from their own experiences online. However, current works often rely on significant prior data, for example minutes of human teleoperation data for each terrain type, which is difficult to scale with more environments and robots. To address these limitations, we propose SALON, a perception-action framework for fast adaptation of traversability estimates with minimal human input. SALON rapidly learns online from experience while avoiding out of distribution terrains to produce adaptive and risk-aware cost and speed maps. Within seconds of collected experience, our results demonstrate comparable navigation performance over kilometer-scale courses in diverse off-road terrain as methods trained on 100-1000x more data. We additionally show promising results on significantly different robots in different environments. Our code is available at https://theairlab.org/SALON.


IRisPath: Enhancing Off-Road Navigation with Robust IR-RGB Fusion for Improved Day and Night Traversability

Sharma, Saksham, Raizada, Akshit, Sundaram, Suresh

arXiv.org Artificial Intelligence

Autonomous off-road navigation is required for applications in agriculture, construction, search and rescue and defence. Traditional on-road autonomous methods struggle with dynamic terrains, leading to poor vehicle control on off-road. Recent deep-learning models have used perception sensors along with kinesthetic feedback for navigation on such terrains. However, this approach has out-of-domain uncertainty. Factors like change in weather and time of day impacts the performance of the model. We propose a multi modal fusion network FuseIsPath capable of using LWIR and RGB images to provide robustness against dynamic weather and light conditions. To aid further works in this domain, we also open-source a day-night dataset with LWIR and RGB images along with pseudo-labels for traversability. In order to co-register the two images we developed a novel method for targetless extrinsic calibration of LWIR, LiDAR and RGB cameras with translation accuracy of 1.7cm and rotation accuracy of 0.827degree.


AUTO-IceNav: A Local Navigation Strategy for Autonomous Surface Ships in Broken Ice Fields

de Schaetzen, Rodrigue, Botros, Alexander, Zhong, Ninghan, Murrant, Kevin, Gash, Robert, Smith, Stephen L.

arXiv.org Artificial Intelligence

Ice conditions often require ships to reduce speed and deviate from their main course to avoid damage to the ship. In addition, broken ice fields are becoming the dominant ice conditions encountered in the Arctic, where the effects of collisions with ice are highly dependent on where contact occurs and on the particular features of the ice floes. In this paper, we present AUTO-IceNav, a framework for the autonomous navigation of ships operating in ice floe fields. Trajectories are computed in a receding-horizon manner, where we frequently replan given updated ice field data. During a planning step, we assume a nominal speed that is safe with respect to the current ice conditions, and compute a reference path. We formulate a novel cost function that minimizes the kinetic energy loss of the ship from ship-ice collisions and incorporate this cost as part of our lattice-based path planner. The solution computed by the lattice planning stage is then used as an initial guess in our proposed optimization-based improvement step, producing a locally optimal path. Extensive experiments were conducted both in simulation and in a physical testbed to validate our approach.


Multilayer occupancy grid for obstacle avoidance in an autonomous ground vehicle using RGB-D camera

Gallego, Jhair S., Ramirez, Ricardo E.

arXiv.org Artificial Intelligence

This work describes the process of integrating a depth camera into the navigation system of a self-driving ground vehicle (SDV) and the implementation of a multilayer costmap that enhances the vehicle's obstacle identification process by expanding its two-dimensional field of view, based on 2D LIDAR, to a three-dimensional perception system using an RGB-D camera. This approach lays the foundation for a robust vision-based navigation and obstacle detection system. A theoretical review is presented and implementation results are discussed for future work.


PACER: Preference-conditioned All-terrain Costmap Generation

Mao, Luisa, Warnell, Garrett, Stone, Peter, Biswas, Joydeep

arXiv.org Artificial Intelligence

In autonomous robot navigation, terrain cost assignment is typically performed using a semantics-based paradigm in which terrain is first labeled using a pre-trained semantic classifier and costs are then assigned according to a user-defined mapping between label and cost. While this approach is rapidly adaptable to changing user preferences, only preferences over the types of terrain that are already known by the semantic classifier can be expressed. In this paper, we hypothesize that a machine-learning-based alternative to the semantics-based paradigm above will allow for rapid cost assignment adaptation to preferences expressed over new terrains at deployment time without the need for additional training. To investigate this hypothesis, we introduce and study PACER, a novel approach to costmap generation that accepts as input a single birds-eye view (BEV) image of the surrounding area along with a user-specified preference context and generates a corresponding BEV costmap that aligns with the preference context. Using both real and synthetic data along with a combination of proposed training tasks, we find that PACER is able to adapt quickly to new user preferences while also exhibiting better generalization to novel terrains compared to both semantics-based and representation-learning approaches.