Self-Speculative Masked Diffusions
Campbell, Andrew, De Bortoli, Valentin, Shi, Jiaxin, Doucet, Arnaud
We present self-speculative masked diffusions, a new class of masked diffusion generative models for discrete data that require significantly fewer function evaluations to generate samples. Standard masked diffusion models predict factorized logits over currently masked positions. A number of masked positions are then sampled, however, the factorization approximation means that sampling too many positions in one go leads to poor sample quality. As a result, many simulation steps and therefore neural network function evaluations are required to generate high-quality data. We reduce the computational burden by generating non-factorized predictions over masked positions. This is achieved by modifying the final transformer attention mask from non-causal to causal, enabling draft token generation and parallel validation via a novel, model-integrated speculative sampling mechanism. This results in a non-factorized predictive distribution over masked positions in a single forward pass. We apply our method to GPT2 scale text modelling and protein sequences generation, finding that we can achieve a ~2x reduction in the required number of network forward passes relative to standard masked diffusion models.
Oct-7-2025
- Country:
- North America > United States (0.46)
- Oceania > Australia (0.04)
- South America > Venezuela
- Capital District > Caracas (0.04)
- Europe
- France (0.28)
- Italy (0.04)
- United Kingdom (0.04)
- Germany (0.04)
- Asia
- Japan (0.04)
- China (0.04)
- Middle East
- Genre:
- Research Report > New Finding (0.46)
- Industry:
- Technology: