$\textit{Jump Your Steps}$: Optimizing Sampling Schedule of Discrete Diffusion Models

Park, Yong-Hyun, Lai, Chieh-Hsin, Hayakawa, Satoshi, Takida, Yuhta, Mitsufuji, Yuki

Oct-10-2024–arXiv.org Artificial Intelligence

Diffusion models have seen notable success in continuous domains, leading to the development of discrete diffusion models (DDMs) for discrete variables. While parallel sampling methods like τ-leaping accelerate this process, they introduce Compounding Decoding Error (CDE), where discrepancies arise between the true distribution and the approximation from parallel token generation, leading to degraded sample quality. More precisely, we derive a practical upper bound on CDE and propose an efficient algorithm for searching for the optimal sampling schedule. Extensive experiments across image, music, and text generation show that JYS significantly improves sampling quality, establishing it as a versatile framework for enhancing DDM performance for fast sampling. Diffusion models (Sohl-Dickstein et al., 2015; Song et al., 2021b; Ho et al., 2020; Song et al., 2021a; Karras et al., 2022) have achieved remarkable success in generation tasks within the continuous domain. However, certain modalities, such as text and music, inherently possess discrete features. Nevertheless, like their continuous counterparts, DDMs encounter a significant bottleneck in sampling speed due to their progressive refinement process. In contrast to continuous-domain diffusion models, where sampling dynamics are driven by samplewise differential equations (Song et al., 2021b), allowing for the direct application of wellestablished numerical methods to accelerate generation, enhancing speed in DDMs poses a significant challenge. To address this, researchers have proposed fast and efficient samplers, including notable methods such as the τ-leaping (Campbell et al., 2022; Lezama et al., 2022; Sun et al., 2023) and k-Gillespie algorithms (Zhao et al., 2024), which facilitate parallel sampling of multiple tokens in a single step. However, this parallel but independent sampling introduces Compounding Decoding Error (CDE) (Lezama et al., 2022), which arises from a mismatch between the training and inference distributions of intermediate latents during parallel sampling. Specifically, while each token is generated according to its marginal distribution, the joint distribution deviates from the learned distribution. To mitigate this issue, the predictor-corrector (PC) sampler (Campbell et al., 2022) has been proposed.

artificial intelligence, diffusion model, machine learning, (15 more...)

arXiv.org Artificial Intelligence

Oct-10-2024

arXiv.org PDF

Add feedback

Country:
- Asia (0.93)
- North America > United States (1.00)

Genre:
- Research Report > New Finding (0.46)

Industry:
- Banking & Finance (0.67)
- Government
  - Regional Government > North America Government
    - United States Government (1.00)
  - Voting & Elections (1.00)
- Health & Medicine
  - Health Care Providers & Services (0.67)
  - Therapeutic Area (0.67)
- Information Technology (0.67)
- Leisure & Entertainment > Sports
  - Baseball (0.67)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (1.00)