Unlocking the Capabilities of Masked Generative Models for Image Synthesis via Self-Guidance

May-27-2025, 20:35:05 GMT–Neural Information Processing Systems

Masked generative models (MGMs) have shown impressive generative ability while providing an order of magnitude efficient sampling steps compared to continuous diffusion models. However, MGMs still underperform in image synthesis compared to recent well-developed continuous diffusion models with similar size in terms of quality and diversity of generated samples. A key factor in the performance of continuous diffusion models stems from the guidance methods, which enhance the sample quality at the expense of diversity. In this paper, we extend these guidance methods to generalized guidance formulation for MGMs and propose a self-guidance sampling method, which leads to better generation quality. The proposed approach leverages an auxiliary task for semantic smoothing in vector-quantized token space, analogous to the Gaussian blur in continuous pixel space.

image synthesis, masked generative model, self-guidance, (6 more...)

Neural Information Processing Systems

May-27-2025, 20:35:05 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Vision (0.99)
  - Machine Learning (0.93)
  - Natural Language > Generation (0.65)