Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models

Open in new window