β-DPO: Direct Preference Optimization with Dynamic β Junkang Wu

Open in new window