Review for NeurIPS paper: BlockGAN: Learning 3D Object-aware Scene Representations from Unlabelled Images
–Neural Information Processing Systems
The 3D features are then composed through taking their element-wise maximum and then concatenation to form a "unified 3D scene representation". For rendering, the unified scene representation is fed to another neural network (the renderer) which also takes as input the camera parameters and produces a rendering of the scene. The model is trained with a GAN-based objective. The authors evaluate their model on multiple datasets such as CLEVR, Real-Car and Synthetic cars and chairs. The main experiments they conduct is as follow 1) To show that their model has learned disentangled representations (i.e. they change attributes of the foreground objects or the background and show the renderings which reflect those changes) 2) To evaluate visual fidelity, they show that their model achieves nearly the same or lower KID estimates compared to other methods for all datasets 3) They train a model on scenes with 1 object but show that they can use that model to generate scenes up to 5 objects of the same category with different attributes 4) They show that they can manipulate some of the attributes (e.g.
Neural Information Processing Systems
Jan-24-2025, 04:14:30 GMT
- Technology: