Energy-Based Reward Models for Robust Language Model Alignment

Open in new window