Reviews: Visual Object Networks: Image Generation with Disentangled 3D Representations

Oct-7-2024, 19:39:53 GMT–Neural Information Processing Systems

This paper describes a generative model for image formation, with disentangled latent parameters for shape, viewpoint and texture. This is in keeping with the vision as an inverse graphics problem, where image generation is formulated as a parameter search in model space, that when rendered, produces the given image. The difference between the rendered image and the original image is used to train the model. Using inverse graphics as inspiration, this paper learns the following models: 1. An voxel generator that can map the latent 3D shape code to a voxellized 3D shape 2.A differentiable projection module that converts the output of 1 to a 2.5D sketch (depth map) and a silhouette mask, conditional on a latent representation of the required viewpoint 3.A texture generator, that can map the output of 2 to a realistic textured image, conditional on a latent representation of the required texture 4.A 2.5D sketch encoder that can map a 2D image to a 2.5D sketch 5.A texture encoder that maps the texture of the object in a 2D image into a texture latent code The models are learnt adversarially using GANs, and without the need for paired image and shape data using the now common cycle consistency constraints.

generative model, image formation, visual object network, (11 more...)

Neural Information Processing Systems

Oct-7-2024, 19:39:53 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology
  - Sensing and Signal Processing > Image Processing (1.00)
  - Artificial Intelligence
    - Vision (0.74)
    - Machine Learning > Neural Networks (0.63)