Reward Model Ensembles Help Mitigate Overoptimization