What Matters in RL-Based Methods for Object-Goal Navigation? An Empirical Study and A Unified Framework
Wang, Hongze, Sun, Boyang, Xing, Jiaxu, Yang, Fan, Hutter, Marco, Shah, Dhruv, Scaramuzza, Davide, Pollefeys, Marc
–arXiv.org Artificial Intelligence
Object-Goal Navigation (ObjectNav) is a critical component toward deploying mobile robots in everyday, uncontrolled environments such as homes, schools, and workplaces. In this context, a robot must locate target objects in previously unseen environments using only its onboard perception. Success requires the integration of semantic understanding, spatial reasoning, and long-horizon planning, which is a combination that remains extremely challenging. While reinforcement learning (RL) has become the dominant paradigm, progress has spanned a wide range of design choices, yet the field still lacks a unifying analysis to determine which components truly drive performance. In this work, we conduct a large-scale empirical study of modular RL-based ObjectNav systems, decomposing them into three key components: perception, policy, and test-time enhancement. Through extensive controlled experiments, we isolate the contribution of each and uncover clear trends: perception quality and test-time strategies are decisive drivers of performance, whereas policy improvements with current methods yield only marginal gains. Building on these insights, we propose practical design guidelines and demonstrate an enhanced modular system that surpasses State-of-the-Art (SotA) methods by 6.6% on SPL and by a 2.7% success rate. We also introduce a human baseline under identical conditions, where experts achieve an average 98% success, underscoring the gap between RL agents and human-level navigation. Our study not only sets the SotA performance but also provides principled guidance for future ObjectNav development and evaluation. Recent advances in computer vision and deep learning have inspired growing interest in interdisciplinary applications that bridge perception, reasoning, and control, especially in robotics. Among these, vision-based navigation has emerged as a foundational capability for autonomous mobile agents. A key benchmark in this domain is Object-Goal Navigation (ObjectNav), where a robot must navigate to an instance of a specified object category in an unseen environment, relying solely on its onboard sensors. This task is both practically important and technically challenging: it requires semantic understanding, spatial reasoning, and long-horizon planning. Among many approaches, Reinforcement Learning (RL) has become a dominant paradigm for ObjectNav, offering a structured framework to learn directly through trial-and-error and showing steady progress across various benchmarks. While end-to-end RL policies are common, modular RL approaches have shown greater robustness and improved generalization.
arXiv.org Artificial Intelligence
Oct-3-2025
- Genre:
- Research Report > New Finding (0.93)
- Industry:
- Education > Educational Setting (0.54)
- Technology: