Dual-Weighted Reinforcement Learning for Generative Preference Modeling