\beta -DPO: Direct Preference Optimization with Dynamic \beta

Open in new window