Reviews: Visual Object Networks: Image Generation with Disentangled 3D Representations
–Neural Information Processing Systems
This paper describes a generative model for image formation, with disentangled latent parameters for shape, viewpoint and texture. This is in keeping with the vision as an inverse graphics problem, where image generation is formulated as a parameter search in model space, that when rendered, produces the given image. The difference between the rendered image and the original image is used to train the model. Using inverse graphics as inspiration, this paper learns the following models: 1. An voxel generator that can map the latent 3D shape code to a voxellized 3D shape 2.A differentiable projection module that converts the output of 1 to a 2.5D sketch (depth map) and a silhouette mask, conditional on a latent representation of the required viewpoint 3.A texture generator, that can map the output of 2 to a realistic textured image, conditional on a latent representation of the required texture 4.A 2.5D sketch encoder that can map a 2D image to a 2.5D sketch 5.A texture encoder that maps the texture of the object in a 2D image into a texture latent code The models are learnt adversarially using GANs, and without the need for paired image and shape data using the now common cycle consistency constraints.
Neural Information Processing Systems
Oct-7-2024, 19:39:53 GMT
- Technology: