Understanding Reference Policies in Direct Preference Optimization

Open in new window