Generative Category-level Object Pose Estimation via Diffusion Models

Jan-19-2025, 18:36:49 GMT–Neural Information Processing Systems

Object pose estimation plays a vital role in embodied AI and computer vision, enabling intelligent agents to comprehend and interact with their surroundings. Despite the practicality of category-level pose estimation, current approaches encounter challenges with partially observed point clouds, known as the multihypothesis issue. In this study, we propose a novel solution by reframing categorylevel object pose estimation as conditional generative modeling, departing from traditional point-to-point regression. Leveraging score-based diffusion models, we estimate object poses by sampling candidates from the diffusion model and aggregating them through a two-step process: filtering out outliers via likelihood estimation and subsequently mean-pooling the remaining candidates. To avoid the costly integration process when estimating the likelihood, we introduce an alternative method that distils an energy-based model from the original score-based model, enabling end-to-end likelihood estimation.

diffusion model, generative category-level object pose estimation, likelihood estimation

Neural Information Processing Systems

Jan-19-2025, 18:36:49 GMT

Conferences Web Page

Add feedback

Genre:
- Research Report (0.62)

Technology:
- Information Technology > Artificial Intelligence
  - Vision > Video Understanding (1.00)
  - Machine Learning (1.00)