Distilled Decoding 2: One-step Sampling of Image Auto-regressive Models with Conditional Score Distillation
–Neural Information Processing Systems
Image Auto-regressive (AR) models have emerged as a powerful paradigm of visual generative models. Despite their promising performance, they suffer from slow generation speed due to the large number of sampling steps required. Although Distilled Decoding 1 (DD1) was recently proposed to enable few-step sampling for image AR models, it still incurs significant performance degradation in the one-step setting, and relies on a pre-defined mapping that limits its flexibility. In this work, we propose a new method, Distilled Decoding 2(DD2), to further advance the feasibility of one-step sampling for image AR models. Unlike DD1, DD2 does not without rely on a pre-defined mapping. We view the original AR model as a teacher model that provides the ground truth conditional score in the latent embedding space at each token position.
Neural Information Processing Systems
Jun-14-2026, 12:37:15 GMT
- Genre:
- Research Report > Experimental Study (1.00)
- Technology:
- Information Technology > Artificial Intelligence
- Vision (1.00)
- Natural Language (0.89)
- Representation & Reasoning (0.68)
- Machine Learning > Neural Networks (0.46)
- Information Technology > Artificial Intelligence