Length-Controlled Margin-Based Preference Optimization without Reference Model

Open in new window