PB$^2$: Preference Space Exploration via Population-Based Methods in Preference-Based Reinforcement Learning