Towards Better Alignment: Training Diffusion Models with Reinforcement Learning Against Sparse Rewards

Open in new window