TIS-DPO: Token-level Importance Sampling for Direct Preference Optimization With Estimated Weights

Open in new window