β-DPO: Direct Preference Optimization with Dynamic β Junkang Wu1 Zhengyi Yang 1 Jiancan Wu1

Open in new window