Diffusion-based Reinforcement Learning via Q-weighted Variational Policy Optimization