WARM: On the Benefits of Weight Averaged Reward Models