UniAPL: A Unified Adversarial Preference Learning Framework for Instruct-Following

Open in new window