Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF