Wasserstein Distributionally Robust Regret Optimization for Reinforcement Learning from Human Feedback