Adversarial Policy Optimization for Offline Preference-based Reinforcement Learning