Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective

May-29-2025, 00:53:09 GMT–Neural Information Processing Systems

Latent-based image generative models, such as Latent Diffusion Models (LDMs) and Mask Image Models (MIMs), have achieved notable success in image generation tasks. These models typically leverage reconstructive autoencoders like VQGAN or VAE to encode pixels into a more compact latent space and learn the data distribution in the latent space instead of directly from pixels. However, this practice raises a pertinent question: Is it truly the optimal choice? In response, we begin with an intriguing observation: despite sharing the same latent space, autoregressive models significantly lag behind LDMs and MIMs in image generation. This finding contrasts sharply with the field of NLP, where the autoregressive model GPT has established a commanding presence.

artificial intelligence, latent space, machine learning, (15 more...)

Neural Information Processing Systems

May-29-2025, 00:53:09 GMT

Conferences PDF

Add feedback

Genre:
- Research Report
  - Experimental Study (1.00)
  - New Finding (0.93)

Industry:
- Information Technology > Security & Privacy (0.46)

Technology:
- Information Technology
  - Artificial Intelligence
    - Machine Learning
      - Neural Networks (1.00)
      - Statistical Learning (1.00)
    - Vision (1.00)
  - Sensing and Signal Processing > Image Processing (1.00)