Simplify RLHF as Reward-Weighted SFT: A Variational Method