Energy-Based Preference Model Offers Better Offline Alignment than the Bradley-Terry Preference Model

Open in new window