Offline RLHF Methods Need More Accurate Supervision Signals