It Takes Two: On the Seamlessness between Reward and Policy Model in RLHF