Review for NeurIPS paper: Semantic Visual Navigation by Watching YouTube Videos
–Neural Information Processing Systems
This paper proposes to leverage (mostly real-estate) unlabelled YouTube videos of egocentric navigation in indoor environments, to train the Q value function network for the high-level part of a hierarchical RL policy for goal-driven indoor robot navigation. The lower-level part relies on depth-based obstacle avoidance and planning in 2D maps. The method works in an unsupervised way by relying on two ways of augmenting the egocentric navigation video dataset: 1) extract action labels from motion classifiers and 2) extract semantic goal labels from object detection. It uses these two to 3) build experience replay tuples of (previous image, action, next image, goal) and then train the goal-conditional value function using Q-Learning. The high-level policy predicts Q values for navigating a topological graph.
Neural Information Processing Systems
Jan-23-2025, 01:23:29 GMT
- Technology: