Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF

Open in new window