Goto

Collaborating Authors

 navigation scenario


FLYINGTRUST: A Benchmark for Quadrotor Navigation Across Scenarios and Vehicles

Li, Gang, Zhai, Chunlei, Wang, Teng, Li, Shaun, Jiang, Shangsong, Zhu, Xiangwei

arXiv.org Artificial Intelligence

Abstract--Visual navigation algorithms for quadrotors often exhibit a large variation in performance when transferred across different vehicle platforms and scene geometries, which increases the cost and risk of field deployment. T o support systematic early-stage evaluation, we introduce FL YINGTRUST, a high-fidelity, configurable benchmarking framework that measures how platform kinodynamics and scenario structure jointly affect navigation robustness. The benchmark pairs a diverse scenario library with a heterogeneous set of real and virtual platforms and prescribes a standardized evaluation protocol together with a composite scoring method that balances scenario importance, platform importance and performance stability. We use FL YINGTRUST to compare representative optimization-based and learning-based navigation approaches under identical conditions, performing repeated trials per platform-scenario combination and reporting uncertainty-aware metrics. The results reveal systematic patterns: navigation success depends predictably on platform capability and scene geometry, and different algorithms exhibit distinct preferences and failure modes across the evaluated conditions. These observations highlight the practical necessity of incorporating both platform capability and scenario structure into algorithm design, evaluation, and selection, and they motivate future work on methods that remain robust across diverse platforms and scenarios. NMANNED Aerial V ehicles (UA Vs) are aircraft operated without onboard human pilots, either by remote control or by preprogrammed flight plans [1]. By independently modulating the speeds of four motor-propeller units, a quadrotor can generate collective thrust for vertical motion and differential thrust and reaction torques for attitude control. These capabilities enable six degrees of freedom motion combined with fine low-speed control, which drive extensive adoption of quadrotors in precision agriculture, infrastructure inspection, high-resolution mapping, environmental monitoring and disaster response [2]-[11]. The benchmark of FL YINGTRUST is available at https://github.com/ The blue line represents the straight-line reference path, and the red curve is an example of a collision-free trajectory executed by a planner. Over the last decade, many high-performance visual navigation methods have been developed, ranging from classical optimization-based planners to recent learning-based approaches [12]-[15].


SocialNav-SUB: Benchmarking VLMs for Scene Understanding in Social Robot Navigation

Munje, Michael J., Tang, Chen, Liu, Shuijing, Hu, Zichao, Zhu, Yifeng, Cui, Jiaxun, Warnell, Garrett, Biswas, Joydeep, Stone, Peter

arXiv.org Artificial Intelligence

Robot navigation in dynamic, human-centered environments requires socially-compliant decisions grounded in robust scene understanding. Recent Vision-Language Models (VLMs) exhibit promising capabilities such as object recognition, common-sense reasoning, and contextual understanding-capabilities that align with the nuanced requirements of social robot navigation. However, it remains unclear whether VLMs can accurately understand complex social navigation scenes (e.g., inferring the spatial-temporal relations among agents and human intentions), which is essential for safe and socially compliant robot navigation. While some recent works have explored the use of VLMs in social robot navigation, no existing work systematically evaluates their ability to meet these necessary conditions. In this paper, we introduce the Social Navigation Scene Understanding Benchmark (SocialNav-SUB), a Visual Question Answering (VQA) dataset and benchmark designed to evaluate VLMs for scene understanding in real-world social robot navigation scenarios. SocialNav-SUB provides a unified framework for evaluating VLMs against human and rule-based baselines across VQA tasks requiring spatial, spatiotemporal, and social reasoning in social robot navigation. Through experiments with state-of-the-art VLMs, we find that while the best-performing VLM achieves an encouraging probability of agreeing with human answers, it still underperforms simpler rule-based approach and human consensus baselines, indicating critical gaps in social scene understanding of current VLMs. Our benchmark sets the stage for further research on foundation models for social robot navigation, offering a framework to explore how VLMs can be tailored to meet real-world social robot navigation needs. An overview of this paper along with the code and data can be found at https://larg.github.io/socialnav-sub .


Navigation-GPT: A Robust and Adaptive Framework Utilizing Large Language Models for Navigation Applications

Ma, Feng, Wang, Xiu-min, Chen, Chen, Xu, Xiao-bin, Yan, Xin-ping

arXiv.org Artificial Intelligence

Existing navigation decision support systems often perform poorly when handling non-predefined navigation scenarios. Leveraging the generalization capabilities of large language model (LLM) in handling unknown scenarios, this research proposes a dual-core framework for LLM applications to address this issue. Firstly, through ReAct-based prompt engineering, a larger LLM core decomposes intricate navigation tasks into manageable sub-tasks, which autonomously invoke corresponding external tools to gather relevant information, using this feedback to mitigate the risk of LLM hallucinations. Subsequently, a fine-tuned and compact LLM core, acting like a first-mate is designed to process such information and unstructured external data, then to generates context-aware recommendations, ultimately delivering lookout insights and navigation hints that adhere to the International Regulations for Preventing Collisions at Sea (COLREGs) and other rules. Extensive experiments demonstrate the proposed framework not only excels in traditional ship collision avoidance tasks but also adapts effectively to unstructured, non-predefined, and unpredictable scenarios. A comparative analysis with DeepSeek-R1, GPT-4o and other SOTA models highlights the efficacy and rationality of the proposed framework. This research bridges the gap between conventional navigation systems and LLMs, offering a framework to enhance safety and operational efficiency across diverse navigation applications.


Factor Graph-Based Active SLAM for Spacecraft Proximity Operations

Ticozzi, Lorenzo, Tsiotras, Panagiotis

arXiv.org Artificial Intelligence

We investigate a scenario where a chaser spacecraft or satellite equipped with a monocular camera navigates in close proximity to a target spacecraft. The satellite's primary objective is to construct a representation of the operational environment and localize itself within it, utilizing the available image data. We frame the joint task of state trajectory and map estimation as an instance of smoothing-based simultaneous localization and mapping (SLAM), where the underlying structure of the problem is represented as a factor graph. Rather than considering estimation and planning as separate tasks, we propose to control the camera observations to actively reduce the uncertainty of the estimation variables, the spacecraft state, and the map landmarks. This is accomplished by adopting an information-theoretic metric to reason about the impact of candidate actions on the evolution of the belief state. Numerical simulations indicate that the proposed method successfully captures the interplay between planning and estimation, hence yielding reduced uncertainty and higher accuracy when compared to commonly adopted passive sensing strategies.


Cooperation and Fairness in Multi-Agent Reinforcement Learning

Aloor, Jasmine Jerry, Nayak, Siddharth, Dolan, Sydney, Balakrishnan, Hamsa

arXiv.org Artificial Intelligence

Multi-agent systems are trained to maximize shared cost objectives, which typically reflect system-level efficiency. However, in the resource-constrained environments of mobility and transportation systems, efficiency may be achieved at the expense of fairness -- certain agents may incur significantly greater costs or lower rewards compared to others. Tasks could be distributed inequitably, leading to some agents receiving an unfair advantage while others incur disproportionately high costs. It is important to consider the tradeoffs between efficiency and fairness. We consider the problem of fair multi-agent navigation for a group of decentralized agents using multi-agent reinforcement learning (MARL). We consider the reciprocal of the coefficient of variation of the distances traveled by different agents as a measure of fairness and investigate whether agents can learn to be fair without significantly sacrificing efficiency (i.e., increasing the total distance traveled). We find that by training agents using min-max fair distance goal assignments along with a reward term that incentivizes fairness as they move towards their goals, the agents (1) learn a fair assignment of goals and (2) achieve almost perfect goal coverage in navigation scenarios using only local observations. For goal coverage scenarios, we find that, on average, our model yields a 14% improvement in efficiency and a 5% improvement in fairness over a baseline trained using random assignments. Furthermore, an average of 21% improvement in fairness can be achieved compared to a model trained on optimally efficient assignments; this increase in fairness comes at the expense of only a 7% decrease in efficiency. Finally, we extend our method to environments in which agents must complete coverage tasks in prescribed formations and show that it is possible to do so without tailoring the models to specific formation shapes.


Towards Inferring Users' Impressions of Robot Performance in Navigation Scenarios

Zhang, Qiping, Tsoi, Nathan, Choi, Booyeon, Tan, Jie, Chiang, Hao-Tien Lewis, Vázquez, Marynel

arXiv.org Artificial Intelligence

Human impressions of robot performance are often measured through surveys. As a more scalable and cost-effective alternative, we study the possibility of predicting people's impressions of robot behavior using non-verbal behavioral cues and machine learning techniques. To this end, we first contribute the SEAN TOGETHER Dataset consisting of observations of an interaction between a person and a mobile robot in a Virtual Reality simulation, together with impressions of robot performance provided by users on a 5-point scale. Second, we contribute analyses of how well humans and supervised learning techniques can predict perceived robot performance based on different combinations of observation types (e.g., facial, spatial, and map features). Our results show that facial expressions alone provide useful information about human impressions of robot performance; but in the navigation scenarios we tested, spatial features are the most critical piece of information for this inference task. Also, when evaluating results as binary classification (rather than multiclass classification), the F1-Score of human predictions and machine learning models more than doubles, showing that both are better at telling the directionality of robot performance than predicting exact performance ratings. Based on our findings, we provide guidelines for implementing these predictions models in real-world navigation scenarios.


Robot Gaze During Autonomous Navigation and its Effect on Social Presence

He, Kerry, Chan, Wesley P., Cosgun, Akansel, Joy, Albin, Croft, Elizabeth A.

arXiv.org Artificial Intelligence

As robots have become increasingly common in human-rich environments, it is critical that they are able to exhibit social cues to be perceived as a cooperative and socially-conformant team member. We investigate the effect of robot gaze cues on people's subjective perceptions of a mobile robot as a socially present entity in three common hallway navigation scenarios. The tested robot gaze behaviors were path-oriented (looking at its own future path), or person-oriented (looking at the nearest person), with fixed-gaze as the control. We conduct a real-world study with 36 participants who walked through the hallway, and an online study with 233 participants who were shown simulated videos of the same scenarios. Our results suggest that the preferred gaze behavior is scenario-dependent. Person-oriented gaze behaviors which acknowledge the presence of the human are generally preferred when the robot and human cross paths. However, this benefit is diminished in scenarios that involve less implicit interaction between the robot and the human.


Metrics for Evaluating Social Conformity of Crowd Navigation Algorithms

Wang, Junxian, Chan, Wesley P., Carreno-Medrano, Pamela, Cosgun, Akansel, Croft, Elizabeth

arXiv.org Artificial Intelligence

Recent protocols and metrics for training and evaluating autonomous robot navigation through crowds are inconsistent due to diversified definitions of "social behavior". This makes it difficult, if not impossible, to effectively compare published navigation algorithms. Furthermore, with the lack of a good evaluation protocol, resulting algorithms may fail to generalize, due to lack of diversity in training. To address these gaps, this paper facilitates a more comprehensive evaluation and objective comparison of crowd navigation algorithms by proposing a consistent set of metrics that accounts for both efficiency and social conformity, and a systematic protocol comprising multiple crowd navigation scenarios of varying complexity for evaluation. We tested four state-of-the-art algorithms under this protocol. Results revealed that some state-of-the-art algorithms have much challenge in generalizing, and using our protocol for training, we were able to improve the algorithm's performance. We demonstrate that the set of proposed metrics provides more insight and effectively differentiates the performance of these algorithms with respect to efficiency and social conformity.