STRAPPER: Preference-based Reinforcement Learning via Self-training Augmentation and Peer Regularization