Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective
–Neural Information Processing Systems
Latent-based image generative models, such as Latent Diffusion Models (LDMs) and Mask Image Models (MIMs), have achieved notable success in image generation tasks. These models typically leverage reconstructive autoencoders like VQGAN or VAE to encode pixels into a more compact latent space and learn the data distribution in the latent space instead of directly from pixels. However, this practice raises a pertinent question: Is it truly the optimal choice? In response, we begin with an intriguing observation: despite sharing the same latent space, autoregressive models significantly lag behind LDMs and MIMs in image generation. This finding contrasts sharply with the field of NLP, where the autoregressive model GPT has established a commanding presence.
Neural Information Processing Systems
May-29-2025, 00:53:09 GMT
- Genre:
- Research Report
- Experimental Study (1.00)
- New Finding (0.93)
- Research Report
- Industry:
- Information Technology > Security & Privacy (0.46)
- Technology: