Semantic Image Synthesis with Semantically Coupled VQ-Model

Alaniz, Stephan, Hummel, Thomas, Akata, Zeynep

Sep-6-2022–arXiv.org Artificial Intelligence

Semantic image synthesis enables control over unconditional image generation by allowing guidance on what is being generated. We conditionally synthesize the latent space from a vector quantized model (VQ-model) pre-trained to autoencode images. Instead of training an autoregressive Transformer on separately learned conditioning latents and image latents, we find that jointly learning the conditioning and image latents significantly improves the modeling capabilities of the Transformer model. While our jointly trained VQ-model achieves a similar reconstruction performance to a vanilla VQ-model for both semantic and image latents, tying the two modalities at the autoencoding stage proves to be an important ingredient to improve autoregressive modeling performance. We show that our model improves semantic image synthesis using autoregressive models on popular semantic image datasets ADE20k, Cityscapes and COCO-Stuff. Figure 1: A semantically coupled VQ-model together with a Transformer generator synthesizes images that follows the semantic guidance closer and has higher fidelity.

latent, semantically, vq-model, (16 more...)

arXiv.org Artificial Intelligence

Sep-6-2022

arXiv.org PDF

Add feedback

Country:
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)

Genre:
- Research Report (0.82)

Industry:
- Information Technology > Security & Privacy (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks (1.00)
  - Vision (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found