MC-DiT: Contextual Enhancement via Clean-to-Clean Reconstruction for Masked Diffusion Models

Neural Information Processing Systems 

Diffusion Transformer (DiT) is emerging as a cutting-edge trend in the landscape of generative diffusion models for image generation.