Reward-Augmented Data Enhances Direct Preference Alignment of LLMs

Open in new window