Goto

Collaborating Authors

 disentangling geometry and appearance


Multiview Neural Surface Reconstruction by Disentangling Geometry and Appearance

Neural Information Processing Systems

In this work we address the challenging problem of multiview 3D surface reconstruction. We introduce a neural network architecture that simultaneously learns the unknown geometry, camera parameters, and a neural renderer that approximates the light reflected from the surface towards the camera. The geometry is represented as a zero level-set of a neural network, while the neural renderer, derived from the rendering equation, is capable of (implicitly) modeling a wide set of lighting conditions and materials. We trained our network on real world 2D images of objects with different material properties, lighting conditions, and noisy camera initializations from the DTU MVS dataset. We found our model to produce state of the art 3D surface reconstructions with high fidelity, resolution and detail.


Review for NeurIPS paper: Multiview Neural Surface Reconstruction by Disentangling Geometry and Appearance

Neural Information Processing Systems

Clarity: As written, I find the explanation of the "implicit differentiable renderer" to be highly misleading. Throughout the paper, the network M is described as a "differentiable renderer" that accounts for both BRDF and lighting conditions. Indeed, the TITLE of the paper implies that lighting and materials are being recovered in the style of full inverse rendering. Section 3.2 introduces the rendering equation and makes a big deal about specifying the BRDF and light sources. However, this is all rendered pointless by lines 155-156: "Replacing M0 with a (sufficiently large) MLP approximation M provides the radiance approximation…" Rolling up all these factors into one big function means you are learning a surface light field, nothing more.


Multiview Neural Surface Reconstruction by Disentangling Geometry and Appearance

Neural Information Processing Systems

In this work we address the challenging problem of multiview 3D surface reconstruction. We introduce a neural network architecture that simultaneously learns the unknown geometry, camera parameters, and a neural renderer that approximates the light reflected from the surface towards the camera. The geometry is represented as a zero level-set of a neural network, while the neural renderer, derived from the rendering equation, is capable of (implicitly) modeling a wide set of lighting conditions and materials. We trained our network on real world 2D images of objects with different material properties, lighting conditions, and noisy camera initializations from the DTU MVS dataset. We found our model to produce state of the art 3D surface reconstructions with high fidelity, resolution and detail.


Fantasia3D: Disentangling Geometry and Appearance for High-quality Text-to-3D Content Creation

arXiv.org Artificial Intelligence

Automatic 3D content creation has achieved rapid progress recently due to the availability of pre-trained, large language models and image diffusion models, forming the emerging topic of text-to-3D content creation. Existing text-to-3D methods commonly use implicit scene representations, which couple the geometry and appearance via volume rendering and are suboptimal in terms of recovering finer geometries and achieving photorealistic rendering; consequently, they are less effective for generating high-quality 3D assets. In this work, we propose a new method of Fantasia3D for high-quality text-to-3D content creation. Key to Fantasia3D is the disentangled modeling and learning of geometry and appearance. For geometry learning, we rely on a hybrid scene representation, and propose to encode surface normal extracted from the representation as the input of the image diffusion model. For appearance modeling, we introduce the spatially varying bidirectional reflectance distribution function (BRDF) into the text-to-3D task, and learn the surface material for photorealistic rendering of the generated surface. Our disentangled framework is more compatible with popular graphics engines, supporting relighting, editing, and physical simulation of the generated 3D assets. We conduct thorough experiments that show the advantages of our method over existing ones under different text-to-3D task settings. Project page and source codes: https://fantasia3d.github.io/.