LagMemo: Language 3D Gaussian Splatting Memory for Multi-modal Open-vocabulary Multi-goal Visual Navigation
Zhou, Haotian, Wang, Xiaole, Li, He, Sun, Fusheng, Guo, Shengyu, Qi, Guolei, Xu, Jianghuan, Zhao, Huijing
–arXiv.org Artificial Intelligence
Abstract-- Navigating to a designated goal using visual information is a fundamental capability for intelligent robots. Most classical visual navigation methods are restricted to single-goal, single-modality, and closed set goal settings. T o address the practical demands of multi-modal, open-vocabulary goal queries and multi-goal visual navigation, we propose LagMemo, a navigation system that leverages a language 3D Gaussian Splatting memory. With incoming task goals, the system queries the memory, predicts candidate goal locations, and integrates a local perception-based verification mechanism to dynamically match and validate goals during navigation. For fair and rigorous evaluation, we curate GOA T -Core, a high-quality core split distilled from GOA T - Bench tailored to multi-modal open-vocabulary multi-goal visual navigation. Experimental results show that LagMemo's memory module enables effective multi-modal open-vocabulary goal localization, and that LagMemo outperforms state-of-the-art methods in multi-goal visual navigation. I. INTRODUCTION In real-world applications such as home assistants and service robots, mobile agents are expected to understand user instructions, perceive environments, and navigate to target objects [1][2]. With the advancement of vision-language models [3][4], and inspired by the fact that humans primarily rely on vision to navigate, visual navigation has emerged as a prominent research area [1][5].
arXiv.org Artificial Intelligence
Oct-29-2025
- Genre:
- Research Report (1.00)
- Technology:
- Information Technology > Artificial Intelligence
- Vision (1.00)
- Robots (1.00)
- Natural Language (1.00)
- Representation & Reasoning
- Agents (0.48)
- Planning & Scheduling (0.34)
- Information Technology > Artificial Intelligence