TGDPO: Harnessing Token-Level Reward Guidance for Enhancing Direct Preference Optimization

Open in new window