PEO: Improving Bi-Factorial Preference Alignment with Post-Training Policy Extrapolation

Open in new window