Transformers for Image-Goal Navigation
–arXiv.org Artificial Intelligence
Autonomous navigation in environments is a critical capability for modern mobile robots, and has been extensively studied over several decades. Classical approaches to navigation rely on constructing detailed maps of the environment and accurate localization of the robot within the map [1, 2, 3]. However, with increasing demand for deploying robots in novel uncontrolled environments such as households, last-mile delivery, etc., constructing accurate and fine-grained maps frequently is often impractical. Robots must now be able to navigate without maps, which means efficient navigation policies require accurate semantic understanding of the scene, efficient exploration and episodic memory, and long-horizon planning with limited knowledge of the environment. Advances in scene understanding and a have led to semantic navigation tasks such as image-goal navigation [4, 5], object-goal navigation [6, 7], etc. receiving significant focus in recent years. In this work, we consider the specific task of image-goal navigation where robot's navigation objective is specified by an RGB image. We motivate the task with the following scenario: Consider a mobile household robot equipped with an onboard camera tasked with picking up a novel unseen object (say, a new shirt). Since the robot has no prior knowledge about the novel object, it would need other semantic information to understand the object - an image of the object would serve this purpose effectively.
arXiv.org Artificial Intelligence
May-23-2024
- Country:
- Asia > Middle East
- North America > United States
- Illinois > Champaign County
- Urbana (0.04)
- Minnesota (0.04)
- Illinois > Champaign County
- Genre:
- Research Report (0.50)
- Technology: