RANa: Retrieval-Augmented Navigation
Monaci, Gianluca, Rezende, Rafael S., Deffayet, Romain, Csurka, Gabriela, Bono, Guillaume, Déjean, Hervé, Clinchant, Stéphane, Wolf, Christian
–arXiv.org Artificial Intelligence
Methods for navigation based on large-scale learning typically treat each episode as a new problem, where the agent is spawned with a clean memory in an unknown environment. While these generalization capabilities to an unknown environment are extremely important, we claim that, in a realistic setting, an agent should have the capacity of exploiting information collected during earlier robot operations. We address this by introducing a new retrieval-augmented agent, trained with RL, capable of querying a database collected from previous episodes in the same environment and learning how to integrate this additional context information. We introduce a unique agent architecture for the general navigation task, evaluated on ImageNav, Instance-ImageNav and ObjectNav. Our retrieval and context encoding methods are data-driven and employ vision foundation models (FM) for both semantic and geometric understanding. We propose new benchmarks for these settings and we show that retrieval allows zero-shot transfer across tasks and environments while significantly improving performance.
arXiv.org Artificial Intelligence
Jul-30-2025
- Genre:
- Research Report (1.00)
- Industry:
- Leisure & Entertainment (0.46)
- Technology:
- Information Technology
- Sensing and Signal Processing > Image Processing (1.00)
- Artificial Intelligence
- Vision (1.00)
- Robots (1.00)
- Representation & Reasoning > Agents (1.00)
- Natural Language > Large Language Model (0.90)
- Machine Learning > Neural Networks
- Deep Learning (0.46)
- Information Technology