Stackelberg Learning from Human Feedback: Preference Optimization as a Sequential Game

Open in new window