Robust Preference Optimization via Dynamic Target Margins

Open in new window