WARP: On the Benefits of Weight Averaged Rewarded Policies