A Survey on Improving Human Robot Collaboration through Vision-and-Language Navigation
Yakolli, Nivedan, Gautam, Avinash, Das, Abhijit, Qi, Yuankai, Shekhawat, Virendra Singh
–arXiv.org Artificial Intelligence
Vision-and-Language Navigation (VLN) is a multi-modal, cooperative task requiring agents to interpret human instructions, navigate 3D environments, and communicate effectively under ambiguity. This paper presents a comprehensive review of recent VLN advancements in robotics and outlines promising directions to improve multi-robot coordination. Despite progress, current models struggle with bidirectional communication, ambiguity resolution, and collaborative decision-making in the multi-agent systems. We review approximately 200 relevant articles to provide an in-depth understanding of the current landscape. Through this survey, we aim to provide a thorough resource that inspires further research at the intersection of VLN and robotics. We advocate that the future VLN systems should support proactive clarification, real-time feedback, and contextual reasoning through advanced natural language understanding (NLU) techniques. Additionally, decentralized decision-making frameworks with dynamic role assignment are essential for scalable, efficient multi-robot collaboration. These innovations can significantly enhance human-robot interaction (HRI) and enable real-world deployment in domains such as healthcare, logistics, and disaster response.
arXiv.org Artificial Intelligence
Dec-2-2025
- Country:
- Europe (0.45)
- Genre:
- Overview (1.00)
- Research Report > New Finding (0.92)
- Industry:
- Health & Medicine (1.00)
- Leisure & Entertainment > Games (0.67)
- Technology:
- Information Technology > Artificial Intelligence
- Robots (1.00)
- Representation & Reasoning > Agents (1.00)
- Cognitive Science > Problem Solving (1.00)
- Machine Learning > Neural Networks
- Deep Learning (1.00)
- Information Technology > Artificial Intelligence