Stackelberg Learning from Human Feedback: Preference Optimization as a Sequential Game