Review for NeurIPS paper: Learning Object-Centric Representations of Multi-Object Scenes from Multiple Views

Jan-23-2025, 16:07:46 GMT–Neural Information Processing Systems

They also get inspiration from prior work on iterative inference for VAEs and propose an inference mechanism to allow the model efficiently learn an object-centric scene representation from multiple views of a scene which contains multiple objects. During training, the model learns to infer [1,...K] objects (d-dimensional Gaussian latents) in the scene where K upper-bounds the number of objects the model can recognize, and K is set to a high enough value. During training 5 views of a scene are presented and the model is expected to reconstruct both the final rendering and object segmentations for a randomly queried novel viewpoint. They evaluate their their model on GQN-Jaco and two variant so the CLEVR datasets. They compare their model to IODINE and GQN for object segmentation, novel queried viewpoint prediction and disentanglement analysis; the results show that their method performs better quantitatively and qualitatively. They also demonstrate that their model has learned good feature-level disentangled representations.

learning object-centric representation, representation, scene representation, (9 more...)

Neural Information Processing Systems

Jan-23-2025, 16:07:46 GMT

Conferences Web Page

Add feedback

Genre:
- Research Report (0.39)

Technology:
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.39)