MAG-Nav: Language-Driven Object Navigation Leveraging Memory-Reserved Active Grounding

Zhang, Weifan, Li, Tingguang, Liu, Yuzhen

Aug-8-2025–arXiv.org Artificial Intelligence

Visual navigation in unknown environments based solely on natural language descriptions is a key capability for intelligent robots. In this work, we propose a navigation framework built upon off-the-shelf Visual Language Models (VLMs), enhanced with two human-inspired mechanisms: perspective-based active grounding, which dynamically adjusts the robot's viewpoint for improved visual inspection, and historical memory backtracking, which enables the system to retain and re-evaluate uncertain observations over time. Unlike existing approaches that passively rely on incidental visual inputs, our method actively optimizes perception and leverages memory to resolve ambiguity, significantly improving vision-language grounding in complex, unseen environments. Our framework operates in a zero-shot manner, achieving strong generalization to diverse and open-ended language descriptions without requiring labeled data or model fine-tuning. Experimental results on Habitat-Matterport 3D (HM3D) show that our method outperforms state-of-the-art approaches in language-driven object navigation. We further demonstrate its practicality through real-world deployment on a quadruped robot, achieving robust and effective navigation performance.

large language model, natural language, navigation, (16 more...)

arXiv.org Artificial Intelligence

Aug-8-2025

arXiv.org PDF

Add feedback

Country:
- Asia > China (0.28)

Genre:
- Research Report (0.84)

Technology:
- Information Technology > Artificial Intelligence
  - Robots (1.00)
  - Natural Language > Large Language Model (0.70)
  - Representation & Reasoning
    - Optimization (0.68)
    - Object-Oriented Architecture (0.47)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found