Reward-Directed Score-Based Diffusion Models via q-Learning