Reliable Semantic Understanding for Real World Zero-shot Object Goal Navigation

Unlu, Halil Utku, Yuan, Shuaihang, Wen, Congcong, Huang, Hao, Tzes, Anthony, Fang, Yi

Oct-29-2024–arXiv.org Artificial Intelligence

We introduce an innovative approach to advancing semantic understanding in zero-shot object goal navigation (ZS-OGN), enhancing the autonomy of robots in unfamiliar environments. Traditional reliance on labeled data has been a limitation for robotic adaptability, which we address by employing a dual-component framework that integrates a GLIP Vision Language Model for initial detection and an Instruction-BLIP model for validation. This combination not only refines object and environmental recognition but also fortifies the semantic interpretation, pivotal for navigational decision-making. Our method, rigorously tested in both simulated and real-world settings, exhibits marked improvements in navigation precision and reliability.

large language model, machine learning, natural language, (13 more...)

arXiv.org Artificial Intelligence

Oct-29-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - New York > Kings County > New York City (0.04)
- Europe > Switzerland
  - Zürich > Zürich (0.14)
- Asia > Middle East
  - UAE > Abu Dhabi Emirate > Abu Dhabi (0.15)

Genre:
- Research Report > Promising Solution (0.48)
- Overview > Innovation (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Vision (1.00)
  - Robots (1.00)
  - Machine Learning (1.00)
  - Natural Language > Large Language Model (0.87)