T-REG: Preference Optimization with Token-Level Reward Regularization

Open in new window