pix2gestalt: Amodal Segmentation by Synthesizing Wholes

Ozguroglu, Ege, Liu, Ruoshi, Surís, Dídac, Chen, Dian, Dave, Achal, Tokmakov, Pavel, Vondrick, Carl

Jan-25-2024–arXiv.org Artificial Intelligence

Our approach capitalizes on diffusion models and transferring their representations to denoising diffusion models [14], which are excellent representations this task, we learn a conditional diffusion model for reconstructing of the natural image manifold and capture all whole objects in challenging zero-shot cases, including different types of whole objects and their occlusions. Due examples that break natural and physical priors, to their large-scale training data, we hypothesize such pretrained such as art. As training data, we use a synthetically curated models have implicitly learned amodal representations dataset containing occluded objects paired with their whole (Figure 2), which we can reconfigure to encode object counterparts. Experiments show that our approach outperforms grouping and perform amodal completion. By learning supervised baselines on established benchmarks. Our from a synthetic dataset of occlusions and their whole counterparts, model can furthermore be used to significantly improve the we create a conditional diffusion model that, given performance of existing object recognition and 3D reconstruction an RGB image and a point prompt, generates whole objects methods in the presence of occlusions.

large language model, machine learning, segmentation, (17 more...)

arXiv.org Artificial Intelligence

Jan-25-2024

arXiv.org PDF

Add feedback

Genre:
- Research Report (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (1.00)
  - Natural Language > Large Language Model (0.37)
  - Vision (1.00)