SPG: Sandwiched Policy Gradient for Masked Diffusion Language Models

Open in new window