HA-VLN: A Benchmark for Human-Aware Navigation in Discrete-Continuous Environments with Dynamic Multi-Human Interactions, Real-World Validation, and an Open Leaderboard

Dong, Yifei, Wu, Fengyi, He, Qi, Li, Heng, Li, Minghan, Cheng, Zebang, Zhou, Yuxuan, Sun, Jingdong, Dai, Qi, Cheng, Zhi-Qi, Hauptmann, Alexander G

Mar-18-2025–arXiv.org Artificial Intelligence

Vision-and-Language Navigation (VLN) systems often focus on either discrete (panoramic) or continuous (free-motion) paradigms alone, overlooking the complexities of human-populated, dynamic environments. We introduce a unified Human-Aware VLN (HA-VLN) benchmark that merges these paradigms under explicit social-awareness constraints. Our contributions include: 1. A standardized task definition that balances discrete-continuous navigation with personal-space requirements; 2. An enhanced human motion dataset (HAPS 2.0) and upgraded simulators capturing realistic multi-human interactions, outdoor contexts, and refined motion-language alignment; 3. Extensive benchmarking on 16,844 human-centric instructions, revealing how multi-human dynamics and partial observability pose substantial challenges for leading VLN agents; 4. Real-world robot tests validating sim-to-real transfer in crowded indoor spaces; and 5. A public leaderboard supporting transparent comparisons across discrete and continuous tasks. Empirical results show improved navigation success and fewer collisions when social context is integrated, underscoring the need for human-centric design. By releasing all datasets, simulators, agent code, and evaluation tools, we aim to advance safer, more capable, and socially responsible VLN research.

evolutionary algorithm, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

Mar-18-2025

arXiv.org PDF

Add feedback

Genre:
- Research Report > New Finding (0.34)
- Workflow (0.67)

Industry:
- Leisure & Entertainment (1.00)
- Media > Television (0.45)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Evolutionary Systems (0.92)
  - Natural Language > Large Language Model (0.94)
  - Representation & Reasoning > Agents (1.00)
  - Robots (1.00)
  - Vision (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found