Provable Reward-Agnostic Preference-Based Reinforcement Learning