Training diffusion models with reinforcement learning