Listwise Reward Estimation for Offline Preference-based Reinforcement Learning