Guided Star-Shaped Masked Diffusion

Meshchaninov, Viacheslav, Shibaev, Egor, Makoian, Artem, Klimov, Ivan, Sheshenya, Danil, Malinin, Andrei, Balagansky, Nikita, Gavrilov, Daniil, Alanov, Aibek, Vetrov, Dmitry

arXiv.org Artificial Intelligence 

The performance of pre-trained masked diffusion models is often constrained by their sampling procedure, which makes decisions irreversible and struggles in low-step generation regimes. We introduce a novel sampling algorithm that works with pre-trained models and, after a lightweight fine-tuning of a single layer, significantly improves sample quality and efficiency. Our method reformulates the generation process using a star-shaped paradigm, which inherently allows for error correction. To make this process effective, we augment it with a learnable re-masking scheduler that intelligently identifies and revises likely errors. This approach yields a substantial quality boost, particularly when using a small number of sampling steps. We extensively ablate key components of our approach and show its usability in different scenarios. In comprehensive experiments on text, and code generation, our sampling algorithm outperforms or matches existing methods. Diffusion probabilistic models have demonstrated remarkable success in generating high-fidelity data, particularly in continuous domains such as image and video synthesis (Sohl-Dickstein et al., 2015; Song & Ermon, 2019; Ho et al., 2020; Sahoo et al., 2024b). A key reason for their effectiveness is the principle of iterative refinement. This allows for a robust error correction mechanism; a mistake made early in the trajectory can be gradually amended in subsequent steps, leading to state-of-the-art results. This elegant property, however, is largely absent in the discrete domain. While discrete diffusion models are making significant strides in areas like natural language processing (Lou et al., 2024; Sahoo et al., 2024a; Schiff et al., 2024), the most successful variants, based on token masking, are built on a foundation that precludes iterative refinement. In a masked diffusion setup, the generation of each token is a one-way street: once a [MASK] is replaced with a concrete token, the model commits to that decision. The token is then frozen and cannot be revisited or updated, even if later steps reveal it to be suboptimal in the broader context.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found