Vision-and-Language Navigation Today and Tomorrow: A Survey in the Era of Foundation Models

Zhang, Yue, Ma, Ziqiao, Li, Jialu, Qiao, Yanyuan, Wang, Zun, Chai, Joyce, Wu, Qi, Bansal, Mohit, Kordjamshidi, Parisa

Jul-9-2024–arXiv.org Artificial Intelligence

Vision-and-Language Navigation (VLN) has gained increasing attention over recent years and many approaches have emerged to advance their development. The remarkable achievements of foundation models have shaped the challenges and proposed methods for VLN research. In this survey, we provide a top-down review that adopts a principled framework for embodied planning and reasoning, and emphasizes the current methods and future opportunities leveraging foundation models to address VLN challenges. We hope our in-depth discussions could provide valuable resources and insights: on one hand, to milestone the progress and explore opportunities and potential roles for foundation models in this field, and on the other, to organize different challenges and solutions in VLN to foundation model researchers.

instruction, navigation, proceedings, (14 more...)

arXiv.org Artificial Intelligence

Jul-9-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Michigan (0.04)
  - North Carolina (0.04)
  - New York (0.04)
- Europe > United Kingdom
  - England > Oxfordshire > Oxford (0.04)
- Asia > China
  - Hong Kong (0.04)
  - Heilongjiang Province > Daqing (0.04)

Genre:
- Research Report (1.00)
- Overview (1.00)

Industry:
- Education (1.00)
- Health & Medicine (0.67)

Technology:
- Information Technology > Artificial Intelligence
  - Vision (1.00)
  - Robots (1.00)
  - Representation & Reasoning > Agents (1.00)
  - Cognitive Science (1.00)
  - Natural Language > Large Language Model (0.96)
  - Machine Learning > Neural Networks
    - Deep Learning (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found