Target-Driven Structured Transformer Planner for Vision-Language Navigation

Zhao, Yusheng, Chen, Jinyu, Gao, Chen, Wang, Wenguan, Yang, Lirong, Ren, Haibing, Xia, Huaxia, Liu, Si

Jul-19-2022–arXiv.org Artificial Intelligence

Vision-language navigation is the task of directing an embodied agent to navigate in 3D scenes with natural language instructions. For the agent, inferring the long-term navigation target from visual-linguistic clues is crucial for reliable path planning, which, however, has rarely been studied before in literature. In this article, we propose a Target-Driven Structured Transformer Planner (TD-STP) for long-horizon goal-guided and room layout-aware navigation. Specifically, we devise an Imaginary Scene Tokenization mechanism for explicit estimation of the long-term target (even located in unexplored environments). In addition, we design a Structured Transformer Planner which elegantly incorporates the explored room layout into a neural attention architecture for structured and global planning. Experimental results demonstrate that our TD-STP substantially improves previous best methods' success rate by 2% and 5% on the test set of R2R and REVERIE benchmarks, respectively. Our code is available at https://github.com/YushengZhao/TD-STP .

computer vision, navigation, proceedings, (11 more...)

arXiv.org Artificial Intelligence

Jul-19-2022

arXiv.org PDF

Add feedback

Country:
- Oceania > Australia
  - New South Wales > Sydney (0.04)
- North America > United States
  - New York > New York County > New York City (0.04)
- Europe > Portugal
  - Lisbon > Lisbon (0.05)
- Asia > China
  - Beijing > Beijing (0.05)
  - Zhejiang Province > Hangzhou (0.04)

Genre:
- Research Report > New Finding (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Vision (1.00)
  - Natural Language (1.00)
  - Representation & Reasoning > Agents (0.88)
  - Machine Learning > Neural Networks
    - Deep Learning (0.47)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found