Distortion of AI Alignment: Does Preference Optimization Optimize for Preferences?

Open in new window