Goto

Collaborating Authors

 object goal navigation


the complete method is significantly different from prior methods ([25,37,38,41]) tackling the object goal navigation

Neural Information Processing Systems

We thank the reviewers for their valuable feedback and comments. R3 & R5 point out that parts of some modules are based on prior work. Novelty is also recognized by R1 ("clear algorithmic innovation") and R2 ("adds several new features"). All reviewers have appreciated the real-world experiments in the submission. R1 & R5 have suggested there should be more emphasis on real-world experiments.


Object Goal Navigation using Goal-Oriented Semantic Exploration

Neural Information Processing Systems

This work studies the problem of object goal navigation which involves navigating to an instance of the given object category in unseen environments. End-to-end learning-based navigation methods struggle at this task as they are ineffective at exploration and long-term planning. We propose a modular system called, `Goal-Oriented Semantic Exploration' which builds an episodic semantic map and uses it to explore the environment efficiently based on the goal object category. Empirical results in visually realistic simulation environments show that the proposed model outperforms a wide range of baselines including end-to-end learning-based methods as well as modular map-based methods and led to the winning entry of the CVPR-2020 Habitat ObjectNav Challenge. Ablation analysis indicates that the proposed model learns semantic priors of the relative arrangement of objects in a scene, and uses them to explore efficiently. Domain-agnostic module design allows us to transfer our model to a mobile robot platform and achieve similar performance for object goal navigation in the real-world.


PanoNav: Mapless Zero-Shot Object Navigation with Panoramic Scene Parsing and Dynamic Memory

Jin, Qunchao, Wu, Yilin, Chen, Changhao

arXiv.org Artificial Intelligence

Zero-shot object navigation (ZSON) in unseen environments remains a challenging problem for household robots, requiring strong perceptual understanding and decision-making capabilities. While recent methods leverage metric maps and Large Language Models (LLMs), they often depend on depth sensors or prebuilt maps, limiting the spatial reasoning ability of Multimodal Large Language Models (MLLMs). Map-less ZSON approaches have emerged to address this, but they typically make short-sighted decisions, leading to local deadlocks due to a lack of historical context. We propose PanoNav, a fully RGB-only, mapless ZSON framework that integrates a Panoramic Scene Parsing module to unlock the spatial parsing potential of MLLMs from panoramic RGB inputs, and a Memory-guided Decision-Making mechanism enhanced by a Dynamic Bounded Memory Queue to incorporate exploration history and avoid local deadlocks. Experiments on the public navigation benchmark show that PanoNav significantly outperforms representative baselines in both SR and SPL metrics.


Object Goal Navigation using Goal-Oriented Semantic Exploration

Neural Information Processing Systems

This work studies the problem of object goal navigation which involves navigating to an instance of the given object category in unseen environments. End-to-end learning-based navigation methods struggle at this task as they are ineffective at exploration and long-term planning. We propose a modular system called, Goal-Oriented Semantic Exploration' which builds an episodic semantic map and uses it to explore the environment efficiently based on the goal object category. Empirical results in visually realistic simulation environments show that the proposed model outperforms a wide range of baselines including end-to-end learning-based methods as well as modular map-based methods and led to the winning entry of the CVPR-2020 Habitat ObjectNav Challenge. Ablation analysis indicates that the proposed model learns semantic priors of the relative arrangement of objects in a scene, and uses them to explore efficiently.


LGR: LLM-Guided Ranking of Frontiers for Object Goal Navigation

Uno, Mitsuaki, Tanaka, Kanji, Iwata, Daiki, Noda, Yudai, Miyazaki, Shoya, Terashima, Kouki

arXiv.org Artificial Intelligence

Object Goal Navigation (OGN) is a fundamental task for robot s and AI, with key applications such as mobile robot image databases (MRID). In particular, mapless OGN is essential i n scenarios involving unknown or dynamic environments. Thi s study aims to enhance recent modular mapless OGN systems by l everaging the commonsense reasoning capabilities of large language models (LLMs). Specifically, we address the challe nge of determining the visiting order in frontier-based exp loration by framing it as a frontier ranking problem. Our approach is g rounded in recent findings that, while LLMs cannot determine the absolute value of a frontier, they excel at evaluating the re lative value between multiple frontiers viewed within a sin gle image using the view image as context. We dynamically manage the fr ontier list by adding and removing elements, using an LLM as a ranking model. The ranking results are represented as re ciprocal rank vectors, which are ideal for multi-view, mult i-query information fusion. Object Goal Navigation (OGN) is a task in which a robot explor es and locates a user-specified object within a workspace, widely studied in robotics and artificial intelligence [1]. If object locations are pre-recorded on a map, the most effici ent method is to retrieve the object from the mobile robot image d atabase [2]-[4]. However, in unknown environments or when map information is unreliable, mapless OGN is essential. Ex isting OGN methods include end-to-end approaches, which directly generate action commands from sensor data [5], but these require extensive training data and high computation al costs.


Review for NeurIPS paper: Object Goal Navigation using Goal-Oriented Semantic Exploration

Neural Information Processing Systems

Summary and Contributions: This paper presents an extension to recent work on Active Neural SLAM [1], where semantic information about object categories is explicitly incorporated into the model. The extensions in the model architecture provide explicit semantic information about the various objects of the scene in the generated 2D map, that allows an agent to navigate in its environment and find a specified goal object much efficiently compared to baselines. Some of these baselines use - and others do not - semantic information. The comparison was performed using Gibson [2] and Matterport3D (MP3D) [3], which include 3D reconstructions of real environments. Training was performed on 86 scenes and testing on 16.


Review for NeurIPS paper: Object Goal Navigation using Goal-Oriented Semantic Exploration

Neural Information Processing Systems

This paper proposes to train an ObjectNav policy that generalises to unseen environments by using a modular system that classifies objects and builds an episodic semantic map, which it is uses to explore the environment based on the object category, building upon the hierarchical method in "Learning to explore using Active Neural SLAM". The method achieved SOTA performance on the 2020 CVPR Object Goal Navigation Habitat Challenge. Interestingly, the policy, trained on Gibson and MP3D, has been transferred and deployed in a real robot, with some success. While the initial reviews were mixed (9, 7, 4, 5), the reviewers converged on (8, 7, 6, 6), agreeing during discussion that the paper deserved to be accepted. Based on the reviews, I recommend this paper for acceptance as a spotlight or poster presentation.


Reliable Semantic Understanding for Real World Zero-shot Object Goal Navigation

Unlu, Halil Utku, Yuan, Shuaihang, Wen, Congcong, Huang, Hao, Tzes, Anthony, Fang, Yi

arXiv.org Artificial Intelligence

We introduce an innovative approach to advancing semantic understanding in zero-shot object goal navigation (ZS-OGN), enhancing the autonomy of robots in unfamiliar environments. Traditional reliance on labeled data has been a limitation for robotic adaptability, which we address by employing a dual-component framework that integrates a GLIP Vision Language Model for initial detection and an Instruction-BLIP model for validation. This combination not only refines object and environmental recognition but also fortifies the semantic interpretation, pivotal for navigational decision-making. Our method, rigorously tested in both simulated and real-world settings, exhibits marked improvements in navigation precision and reliability.


Object Goal Navigation using Goal-Oriented Semantic Exploration

Neural Information Processing Systems

This work studies the problem of object goal navigation which involves navigating to an instance of the given object category in unseen environments. End-to-end learning-based navigation methods struggle at this task as they are ineffective at exploration and long-term planning. We propose a modular system called, Goal-Oriented Semantic Exploration' which builds an episodic semantic map and uses it to explore the environment efficiently based on the goal object category. Empirical results in visually realistic simulation environments show that the proposed model outperforms a wide range of baselines including end-to-end learning-based methods as well as modular map-based methods and led to the winning entry of the CVPR-2020 Habitat ObjectNav Challenge. Ablation analysis indicates that the proposed model learns semantic priors of the relative arrangement of objects in a scene, and uses them to explore efficiently.


Advancing Object Goal Navigation Through LLM-enhanced Object Affinities Transfer

Lin, Mengying, Chen, Yaran, Zhao, Dongbin, Wang, Zhaoran

arXiv.org Artificial Intelligence

In object goal navigation, agents navigate towards objects identified by category labels using visual and spatial information. Previously, solely network-based methods typically rely on historical data for object affinities estimation, lacking adaptability to new environments and unseen targets. Simultaneously, employing Large Language Models (LLMs) for navigation as either planners or agents, though offering a broad knowledge base, is cost-inefficient and lacks targeted historical experience. Addressing these challenges, we present the LLM-enhanced Object Affinities Transfer (LOAT) framework, integrating LLM-derived object semantics with network-based approaches to leverage experiential object affinities, thus improving adaptability in unfamiliar settings. LOAT employs a dual-module strategy: a generalized affinities module for accessing LLMs' vast knowledge and an experiential affinities module for applying learned object semantic relationships, complemented by a dynamic fusion module harmonizing these information sources based on temporal context. The resulting scores activate semantic maps before feeding into downstream policies, enhancing navigation systems with context-aware inputs. Our evaluations in AI2-THOR and Habitat simulators demonstrate improvements in both navigation success rates and efficiency, validating the LOAT's efficacy in integrating LLM insights for improved object goal navigation.