STRAPPER: Preference-based Reinforcement Learning via Self-training Augmentation and Peer Regularization

Open in new window