Partition Generative Modeling: Masked Modeling Without Masks
Deschenaux, Justin, Tran, Lan, Gulcehre, Caglar
–arXiv.org Artificial Intelligence
Masked generative models (MGMs) are widely used to capture complex data and enable faster generation than autoregressive models (AR) through parallel decoding. However, MGMs typically operate on fixed-length inputs, which can be inefficient: early in sampling, most tokens are masked and carry no information, leading to wasted computation. In contrast, AR models process only tokens generated previously, making early iterations faster. In this work, we introduce the Partition Generative Model (PGM), a novel approach that combines the strengths of AR and MGMs. Rather than masking, PGM partitions tokens into two groups and employs sparse attention to block information flow between them. Since there is no information flow between partitions, the model can process the previously-generated tokens only during sampling, while retaining the ability to generate tokens in parallel and in any order. On OpenWebText, PGMs offer at least $5\times$ improvements in sampling latency and throughput, while producing samples with superior Generative Perplexity, compared to Masked Diffusion Language Models. On ImageNet, PGMs achieve a $7.5\times$ higher throughput than MaskGIT, with only a slight increase in FID (5.54 vs. 5.35). With twice as many sampling steps, the FID reduces to 4.56 while while being $3.9\times$ faster than MaskGIT. Finally, PGMs integrate seamlessly with MGM distillation, providing further inference speedups.
arXiv.org Artificial Intelligence
Oct-13-2025
- Country:
- Asia > Middle East
- Jordan (0.04)
- Europe
- France > Hauts-de-France
- Switzerland > Vaud
- Lausanne (0.04)
- South America > Chile
- Asia > Middle East
- Genre:
- Research Report (0.74)
- Technology: