Self-Supervised Geometric Correspondence for Category-Level 6D Object Pose Estimation in the Wild

Zhang, Kaifeng, Fu, Yang, Borse, Shubhankar, Cai, Hong, Porikli, Fatih, Wang, Xiaolong

Apr-3-2023–arXiv.org Artificial Intelligence

While 6D object pose estimation has wide applications across computer vision and robotics, it remains far from being solved due to the lack of annotations. The problem becomes even more challenging when moving to category-level 6D pose, which requires generalization to unseen instances. Current approaches are restricted by leveraging annotations from simulation or collected from humans. In this paper, we overcome this barrier by introducing a self-supervised learning approach trained directly on large-scale real-world object videos for category-level 6D pose estimation in the wild. Our framework reconstructs the canonical 3D shape of an object category and learns dense correspondences between input images and the canonical shape via surface embedding. For training, we propose novel geometrical cycle-consistency losses which construct cycles across 2D-3D spaces, across different instances and different time steps. The learned correspondence can be applied for 6D pose estimation and other downstream tasks such as keypoint transfer. Surprisingly, our method, without any human annotations or simulators, can achieve on-par or even better performance than previous supervised or semisupervised methods on in-the-wild images. Code and videos are available at https://kywind.github.io/self-pose. Object 6D pose estimation is a long-standing problem for computer vision and robotics. In instancelevel 6D pose estimation, a model is trained to estimate the 6D pose for one single instance given its 3D shape template (He et al., 2020; Xiang et al., 2017; Oberweger et al., 2018). For generalizing to unseen objects and removing the requirement of 3D CAD templates, approaches for category-level 6D pose estimation are proposed (Wang et al., 2019b).

artificial intelligence, correspondence, machine learning, (15 more...)

arXiv.org Artificial Intelligence

Apr-3-2023

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.82)

Technology:
- Information Technology > Artificial Intelligence > Vision > Video Understanding (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found