Towards Better Alignment: Training Diffusion Models with Reinforcement Learning Against Sparse Rewards