Attentional Separation-and-Aggregation Network for Self-supervised Depth-Pose Learning in Dynamic Scenes

Gao, Feng, Yu, Jincheng, Shen, Hao, Wang, Yu, Yang, Huazhong

Nov-18-2020–arXiv.org Artificial Intelligence

Learning depth and ego-motion from unlabeled videos via self-supervision from epipolar projection can improve the robustness and accuracy of the 3D perception and localization of vision-based robots. However, the rigid projection computed by ego-motion cannot represent all scene points, such as points on moving objects, leading to false guidance in these regions. To address this problem, we propose an Attentional Separation-and-Aggregation Network (ASANet), which can learn to distinguish and extract the scene's static and dynamic characteristics via the attention mechanism. We further propose a novel MotionNet with an ASANet as the encoder, followed by two separate decoders, to estimate the camera's ego-motion and the scene's dynamic motion field. Then, we introduce an auto-selecting approach to detect the moving objects for dynamic-aware learning automatically. Empirical experiments demonstrate that our method can achieve the state-of-the-art performance on the KITTI benchmark.

deep learning, neural network, video understanding, (12 more...)

arXiv.org Artificial Intelligence

Nov-18-2020

arXiv.org PDF

Add feedback

Country:
- Asia > China (0.14)
- North America > United States (0.14)

Genre:
- Research Report (0.82)

Technology:
- Information Technology
  - Artificial Intelligence
    - Machine Learning > Neural Networks
      - Deep Learning (0.68)
    - Robots (0.89)
    - Vision (1.00)
  - Sensing and Signal Processing > Image Processing (0.93)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found