Toward General Object-level Mapping from Sparse Views with 3D Diffusion Priors

Liao, Ziwei, Xu, Binbin, Waslander, Steven L.

Oct-7-2024–arXiv.org Artificial Intelligence

Object-level mapping [1, 2, 3, 4, 5, 6, 7, 8, 9] builds a 3D map of multiple object instances in a scene, which is critical for scene understanding [10] and has various applications in robotic manipulation [11], semantic navigation [12, 13] and long-term dynamic map maintenance [14]. It addresses two closely coupled tasks: 3D shape reconstruction [15, 16] and pose estimation [17]. Conventional methods [18, 19, 20] approach these tasks from a perspective of state estimation [21], solving an inverse problem where low-dimensional observations (RGB and Depth images) are used to recover high-dimensional unknown variables (3D poses and shapes) through a known observation process (e.g., projection, and differentiable rendering). However, these methods require dense observations (e.g., hundreds of views for NeRF [18]) to fully constrain the problem. In robotics or AR applications, obtaining such dense observations is challenging due to limitations in the robot's or user's observation angle and occlusions in clustered scenarios. Therefore, it is crucial to develop methods that can map from sparse (fewer than 10) or even single observations. Human vision can infer complete 3D objects from images despite occlusions by using prior knowledge of the objects, which represents the prior distributions of the shapes of specific categories, such as chairs, based on thousands of instances observed in daily life. We aim to introduce generative models [22] as providers of prior knowledge to constrain the 3D object mapping. Generative models have demonstrated impressive abilities to generate high-quality multi-modal data by learning distributions in large-scale datasets, including texts [23], images [24], videos [25], and 3D models [26, 27, 28, 29].

artificial intelligence, diffusion model, natural language, (16 more...)

arXiv.org Artificial Intelligence

Oct-7-2024

arXiv.org PDF

Add feedback

Country:
- Europe > Netherlands (0.14)
- North America > Canada
  - Ontario > Toronto (0.14)

Genre:
- Research Report (0.82)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language (1.00)
  - Representation & Reasoning
    - Constraint-Based Reasoning (0.67)
    - Optimization (1.00)
  - Robots (1.00)
  - Vision (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found