Compositional Transformers for Scene Generation
Hudson, Drew A., Zitnick, C. Lawrence
–arXiv.org Artificial Intelligence
We introduce the GANformer2 model, an iterative object-oriented transformer, explored for the task of generative modeling. The network incorporates strong and explicit structural priors, to reflect the compositional nature of visual scenes, and synthesizes images through a sequential process. It operates in two stages: a fast and lightweight planning phase, where we draft a high-level scene layout, followed by an attention-based execution phase, where the layout is being refined, evolving into a rich and detailed picture. Our model moves away from conventional black-box GAN architectures that feature a flat and monolithic latent space towards a transparent design that encourages efficiency, controllability and interpretability. We demonstrate GANformer2's strengths and qualities through a careful evaluation over a range of datasets, from multi-object CLEVR scenes to the challenging COCO images, showing it successfully achieves state-of-the-art performance in terms of visual quality, diversity and consistency. Further experiments demonstrate the model's disentanglement and provide a deeper insight into its generative process, as it proceeds step-by-step from a rough initial sketch, to a detailed layout that accounts for objects' depths and dependencies, and up to the final high-resolution depiction of vibrant and intricate real-world scenes.
arXiv.org Artificial Intelligence
Nov-17-2021
- Country:
- Europe
- France > Hauts-de-France
- Italy > Calabria
- Catanzaro Province > Catanzaro (0.04)
- North America > United States
- California
- Los Angeles County > Long Beach (0.14)
- Santa Clara County > Palo Alto (0.04)
- Hawaii > Honolulu County
- Honolulu (0.04)
- Utah > Salt Lake County
- Salt Lake City (0.04)
- Washington > King County
- Seattle (0.04)
- California
- Oceania > Australia
- New South Wales > Sydney (0.04)
- Europe
- Genre:
- Research Report (0.64)
- Industry:
- Information Technology > Security & Privacy (0.48)
- Transportation > Air (0.34)
- Technology: