EvolveNav: Empowering LLM-Based Vision-Language Navigation via Self-Improving Embodied Reasoning

Lin, Bingqian, Nie, Yunshuang, Zai, Khun Loun, Wei, Ziming, Han, Mingfei, Xu, Rongtao, Niu, Minzhe, Han, Jianhua, Zhang, Hanwang, Lin, Liang, Chen, Bokui, Lu, Cewu, Liang, Xiaodan

Oct-15-2025–arXiv.org Artificial Intelligence

Abstract--Recent studies have revealed the potential of training open-source Large Language Models (LLMs) to unleash LLMs' reasoning ability for enhancing vision-language navigation (VLN) performance, and simultaneously mitigate the domain gap between LLMs' training corpus and the VLN task. However, these approaches predominantly adopt straightforward input-output mapping paradigms, causing the mapping learning difficult and the navigational decisions unexplainable. Chain-of-Thought (CoT) training is a promising way to improve both navigational decision accuracy and interpretability, while the complexity of the navigation task makes the perfect CoT labels unavailable and may lead to overfitting through pure CoT supervised fine-tuning. T o address these issues, we propose EvolveNav, a novel sElf-improving embodied reasoning paradigm that realizes adaptable and generalizable navigational reasoning for boosting LLM-based vision-language Navigation. Specifically, EvolveNav involves a two-stage training process: (1) Formalized CoT Supervised Fine-T uning, where we train the model with curated formalized CoT labels to first activate the model's navigational reasoning These two authors contribute equally to this work. Bokui Chen, Cewu Lu, and Xiaodan Liang are the corresponding authors. Bingqian Lin and Cewu Lu are with Shanghai Jiao T ong University, Shanghai, China. Y unshuang Nie, Khun Loun Zai, and Ziming Wei are with Shenzhen Campus of Sun Y at-sen University, Shenzhen, China. Xiaodan Liang is with Shenzhen Campus of Sun Y at-sen University, Shenzhen, China, Peng Cheng Laboratory, Guangdong Key Laboratory of Big Data Analysis and Processing, Guangzhou, 510006, China. Bokui Chen is with T singhua Shenzhen International Graduate School, T singhua University, China. Mingfei Han is with the Department of Computer Vision, Mohamed Bin Zayed University of Artificial Intelligence, Abu Dhabi, UAE.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

Oct-15-2025

arXiv.org PDF

Add feedback

Country:
- Asia
  - Middle East > UAE
    - Abu Dhabi Emirate > Abu Dhabi (0.54)
  - China > Guangdong Province
    - Shenzhen (1.00)

Genre:
- Research Report (1.00)
- Personal (1.00)

Industry:
- Education > Educational Setting > Higher Education (0.48)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.93)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found