Goto

Collaborating Authors

 Planning & Scheduling


Towards Safe and Efficient Through-the-Canopy Autonomous Fruit Counting with UAVs

arXiv.org Artificial Intelligence

We present an autonomous aerial system for safe and efficient through-the-canopy fruit counting. Aerial robot applications in large-scale orchards face significant challenges due to the complexity of fine-tuning flight paths based on orchard layouts, canopy density, and plant variability. Through-the-canopy navigation is crucial for minimizing occlusion by leaves and branches but is more challenging due to the complex and dense environment compared to traditional over-the-canopy flights. Our system addresses these challenges by integrating: i) a high-fidelity simulation framework for optimizing flight trajectories, ii) a low-cost autonomy stack for canopy-level navigation and data collection, and iii) a robust workflow for fruit detection and counting using RGB images. We validate our approach through fruit counting with canopy-level aerial images and by demonstrating the autonomous navigation capabilities of our experimental vehicle.


RT-GuIDE: Real-Time Gaussian splatting for Information-Driven Exploration

arXiv.org Artificial Intelligence

We propose a framework for active mapping and exploration that leverages Gaussian splatting for constructing information-rich maps. Further, we develop a parallelized motion planning algorithm that can exploit the Gaussian map for real-time navigation. The Gaussian map constructed onboard the robot is optimized for both photometric and geometric quality while enabling real-time situational awareness for autonomy. We show through simulation experiments that our method is competitive with approaches that use alternate information gain metrics, while being orders of magnitude faster to compute. In real-world experiments, our algorithm achieves better map quality (10% higher Peak Signal-to-Noise Ratio (PSNR) and 30% higher geometric reconstruction accuracy) than Gaussian maps constructed by traditional exploration baselines. Experiment videos and more details can be found on our project page: https://tyuezhan.github.io/RT_GuIDE/


GSON: A Group-based Social Navigation Framework with Large Multimodal Model

arXiv.org Artificial Intelligence

GSON: A Group-based Social Navigation Framework with Large Multimodal Model Shangyi Luo, Ji Zhu, Peng Sun, Y uhong Deng, Cunjun Y u, Anxing Xiao, Xueqian Wang Abstract -- With the increasing presence of service robots and autonomous vehicles in human environments, navigation systems need to evolve beyond simple destination reach to incorporate social awareness. This paper introduces GSON, a novel group-based social navigation framework that leverages Large Multimodal Models (LMMs) to enhance robots' social perception capabilities. Our approach uses visual prompting to enable zero-shot extraction of social relationships among pedestrians and integrates these results with robust pedestrian detection and tracking pipelines to overcome the inherent inference speed limitations of LMMs. The planning system incorporates a mid-level planner that sits between global path planning and local motion planning, effectively preserving both global context and reactive responsiveness while avoiding disruption of the predicted social group. Comparative results show that our system significantly outperforms existing navigation approaches in minimizing social perturbations while maintaining comparable performance on traditional navigation metrics. I NTRODUCTION The growth of service robots has driven significant research on autonomous systems capable of navigating human-centered environments [1]-[3]. However, a critical gap exists in current navigation systems: while they excel at trajectory prediction and obstacle avoidance [4]-[8], they often fail to recognize and respect complex social contexts within crowds, such as photography sessions or queuing behaviors, as illustrated in Figure 1. In the broader context of social robot navigation [9], [10], the goal is not only for the robot to reach its destination, but also to interact appropriately with humans without degrading their experience.


HARMONIC: Cognitive and Control Collaboration in Human-Robotic Teams

arXiv.org Artificial Intelligence

This paper presents a novel approach to multi-robot planning and collaboration. We demonstrate a cognitive strategy for robots in human-robot teams that incorporates metacognition, natural language communication, and explainability. The system is embodied using the HARMONIC architecture that flexibly integrates cognitive and control capabilities across the team. We evaluate our approach through simulation experiments involving a joint search task by a team of heterogeneous robots (a UGV and a drone) and a human. We detail the system's handling of complex, real-world scenarios, effective action coordination between robots with different capabilities, and natural human-robot communication. This work demonstrates that the robots' ability to reason about plans, goals, and attitudes, and to provide explanations for actions and decisions are essential prerequisites for realistic human-robot teaming.


Joint Localization and Planning using Diffusion

arXiv.org Artificial Intelligence

Diffusion models have been successfully applied to robotics problems such as manipulation and vehicle path planning. In this work, we explore their application to end-to-end navigation -- including both perception and planning -- by considering the problem of jointly performing global localization and path planning in known but arbitrary 2D environments. In particular, we introduce a diffusion model which produces collision-free paths in a global reference frame given an egocentric LIDAR scan, an arbitrary map, and a desired goal position. To this end, we implement diffusion in the space of paths in SE(2), and describe how to condition the denoising process on both obstacles and sensor observations. In our evaluation, we show that the proposed conditioning techniques enable generalization to realistic maps of considerably different appearance than the training environment, demonstrate our model's ability to accurately describe ambiguous solutions, and run extensive simulation experiments showcasing our model's use as a real-time, end-to-end localization and planning stack.


Enhancing robot reliability for health-care facilities by means of Human-Aware Navigation Planning

arXiv.org Artificial Intelligence

With the aim of enabling robots to cooperate with humans, carry out human-like tasks, or navigate among humans, we need to ensure that they are equipped with the ability to comprehend human behaviors and use the extracted knowledge for intelligent decision-making. This ability is particularly important in the safety-critical and human-centred environment of health-care institutions. In the field of robotic navigation, the most cutting-edge approaches to enhancing robot reliability in the application domain of healthcare facilities and in general pertain to augmenting navigation systems with human-aware properties. To implement this in our work, the Co-operative Human-Aware Navigation planner has been integrated into the ROS-based differential-drive robot MARRtina and exhaustively challenged within various simulated contexts and scenarios (mainly modelling the situations relevant in the medical domain) to draw attention to the integrated system's benefits and identify its drawbacks or instances of poor performance while exploring the scope of system capabilities and creating a full characterization of its applicability. The simulation results are then presented to medical experts, and the enhanced robot acceptability within the domain is validated with them as the robot is further planned for deployment.


Hierarchical Tri-manual Planning for Vision-assisted Fruit Harvesting with Quadrupedal Robots

arXiv.org Artificial Intelligence

Abstract-- This paper addresses the challenge of developing a multi-arm quadrupedal robot capable of efficiently harvesting fruit in complex, natural environments. To overcome the inherent limitations of traditional bimanual manipulation, we introduce the first three-arm quadrupedal robot LocoHarv-3 and propose a novel hierarchical tri-manual planning approach, enabling automated fruit harvesting with collision-free trajectories. Our comprehensive semi-autonomous framework integrates teleoperation, supported by LiDAR-based odometry and mapping, with learning-based visual perception for accurate fruit detection and pose estimation. Validation is conducted through a series of controlled indoor experiments using motion capture and extensive field tests in natural settings. Results demonstrate a 90% success rate in in-lab settings with a single attempt, and field trials further verify the system's robustness and efficiency in more challenging real-world environments.


Go-SLAM: Grounded Object Segmentation and Localization with Gaussian Splatting SLAM

arXiv.org Artificial Intelligence

We introduce Go-SLAM, a novel framework that utilizes 3D Gaussian Splatting SLAM to reconstruct dynamic environments while embedding object-level information within the scene representations. This framework employs advanced object segmentation techniques, assigning a unique identifier to each Gaussian splat that corresponds to the object it represents. Consequently, our system facilitates open-vocabulary querying, allowing users to locate objects using natural language descriptions. Furthermore, the framework features an optimal path generation module that calculates efficient navigation paths for robots toward queried objects, considering obstacles and environmental uncertainties. Comprehensive evaluations in various scene settings demonstrate the effectiveness of our approach in delivering high-fidelity scene reconstructions, precise object segmentation, flexible object querying, and efficient robot path planning. This work represents an additional step forward in bridging the gap between 3D scene reconstruction, semantic object understanding, and real-time environment interactions.


Communication Backbone Reconfiguration with Connectivity Maintenance

arXiv.org Artificial Intelligence

The exchange of information is key in applications that involve multiple agents, such as search and rescue, military operations, and disaster response. In this work, we propose a simple and effective trajectory planning framework that tackles the design, deployment, and reconfiguration of a communication backbone by reframing the problem of networked multi-agent motion planning as a manipulator motion planning problem. Our approach works for backbones of variable configurations both in terms of the number of robots utilized and the distance limit between each robot. While research has been conducted on connection-restricted navigation for multi-robot systems in the last years, the field of manipulators is arguably more developed both in theory and practice. Hence, our methodology facilitates practical applications built on top of widely available motion planning algorithms and frameworks for manipulators.


Optimized Monte Carlo Tree Search for Enhanced Decision Making in the FrozenLake Environment

arXiv.org Artificial Intelligence

Monte Carlo Tree Search (MCTS) is a powerful algorithm for solving complex decision-making problems. This paper presents an optimized MCTS implementation applied to the FrozenLake environment, a classic reinforcement learning task characterized by stochastic transitions. The optimization leverages cumulative reward and visit count tables along with the Upper Confidence Bound for Trees (UCT) formula, resulting in efficient learning in a slippery grid world. We benchmark our implementation against other decision-making algorithms, including MCTS with Policy and Q-Learning, and perform a detailed comparison of their performance. The results demonstrate that our optimized approach effectively maximizes rewards and success rates while minimizing convergence time, outperforming baseline methods, especially in environments with inherent randomness.