Towards Bridging the Reward-Generation Gap in Direct Alignment Algorithms

Open in new window