Review for NeurIPS paper: Semantic Visual Navigation by Watching YouTube Videos

Jan-23-2025, 01:23:29 GMT–Neural Information Processing Systems

This paper proposes to leverage (mostly real-estate) unlabelled YouTube videos of egocentric navigation in indoor environments, to train the Q value function network for the high-level part of a hierarchical RL policy for goal-driven indoor robot navigation. The lower-level part relies on depth-based obstacle avoidance and planning in 2D maps. The method works in an unsupervised way by relying on two ways of augmenting the egocentric navigation video dataset: 1) extract action labels from motion classifiers and 2) extract semantic goal labels from object detection. It uses these two to 3) build experience replay tuples of (previous image, action, next image, goal) and then train the goal-conditional value function using Q-Learning. The high-level policy predicts Q values for navigating a topological graph.

navigation, semantic visual navigation, value function, (11 more...)

Neural Information Processing Systems

Jan-23-2025, 01:23:29 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology
  - Communications > Social Media (1.00)
  - Artificial Intelligence (1.00)