On the Role of Difficult Prompts in Self-Play Preference Optimization

Open in new window